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PREFACE TO FIRST .EDITION 



This report summarizes the work of the Digital Computer Laboratory 
of the University of . Illinois on a study of the feasibility of constructing 
a computer about one hundred times faster than present computers such as 
Illiac using presently available components and techniques . 

Some promising designs and design features considered by the Digital 
Computer Laboratory are either not discussed in this report or are relegated 
to somewhat subordinate places in the report because they involve equipment 
which in the opinion of this Laboratory has a low probability of being built 
so as to have acceptable reliability with presently available components :and 
techniques. 

The first part of this report, Chapters ■ 1, 2 and 3, deals with 
general features of the proposed .computer and postulates the existence of 
various components and parts. The. second part of the report, Chapters^, 5, 
6, 7, and 8, discusses these parts in some detail, beginning with a dis- 
cussion of the circuits from which the different units of the computer would 
be built. A summary of the proposed machine ' s specif ications is found on 
page ix. 

The work herein reported contains results of efforts of all the 
personnel of the Digital Computer Laboratory and some members of the Computa- 
tion Centre of the University of Toronto. The report was prepared by a 
committee consisting of D. B. Gillies, R. E. Meagher, D. E. Muller, R. W. 
McKayC 1 ), J. P. Nash( 2 ), J. E. Robertson and A. H. Taub (Chairman). 

■The study was supported jointly by the Atomic Energy Commission 
and the Off ice of Naval -Research under AEC Contract AT(ll-l)-^15, by the 
University of Illinois and by the University of Toronto. 

October, 1957 



PREFACE TO SECOND EDITION 

The distribution and subsequent mailings of the original printing 
of this report exhausted the supply (300 copies). This printing is called 
a second edition because Chapter U has been considerably revised. Many new 
details as well as corrections are contained in this chapter on basic cir- 
cuits which has been revised by W. J. Poppelbaum. Because circuit times 
and other circuit constants are affected, numerical data are changed elsewhere, 
in the text. 

A number of other. corrections of only a minor . nature are included 
in this edition. The page numbers following Chapter k are changed with 
respect to the' first edition. 

April, 1958 

1 . On the staff of the University of Toronto in residence at the Digital 
Computer Laboratory, University of Illinois, for the year . 1956-57 • 

2. Resigned in April 1957 • 
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SUMMARY AMD CONCLUSIONS 



On the basis of the work reported herein the Digital Computer 
Laboratory of the University of Illinois concludes that it is feasible to 
construct a very fast digital computer in which the transistor circuits 
developed in that Laboratory would be used; 

The results of two design studies are discussed. One involves 
a minimum of buffer storage in the form of transistor registers and is out- 
lined in the body of Chapter 3> while the other involves a moderate amount 
of buffer storage in the form of a small -capacity, high-speed, random- 
access buffer memory and is discussed in the appendix to Chapter 3. The 
former design is emphasized because it is felt that its equipment require- 
ments can be presently met. 

\ X 

""In it, two controls are used, arithmetic control and advanced 
control, as well as buffer storage for instructions and operands, and by 
such means various units of the computer are .kept in simultaneous operation. 
For example, B-modifications, memory accesses and complicated arithmetic 
operations, such as a double -length, add -product instruction, may be per- 
formed concurrently. Short but powerful inner loops may be stored outside 
the memory and acted upon by the control. Many of the gains in speed in 
the control and arithmetic unit are dependent upon asynchronous operation 
of these units. 

The relative speed of the proposed computer compared to that of 
existing machines depends upon the problem being solved. For problems 
dominated by arithmetic operations it is estimated that the proposed compu- 
ter will be 100 to 200 times faster than computers such as Illiac. For 
problems dominated by logical and combinatorial operations, this factor of 
gain in speed will be at least 50. 

The proposed computer has a random-access word-arrangement memory 
of 8192 words of 52 bits each with an access time of 1.5 u.s- 

The arithmetic unit is designed so that the digits of a multiplier 
are sensed and acted upon in such a way that the use of the adder is reduced. 
Furthermore, "carry registers" are used in this unit, and carries are assimi- 
lated only when necessary. It is expected that the proposed computer will 
have an average multiplication time between 3»5 and k (j,s, addition times (with 
assimilation) of .3 us, and division times of 7 to 20 |j.s. 

The computer, aside from inputs-output facilities, will contain 
approximately 15, 1 +00 transistors, 3^000 diodes, and ^2,000 resistors. The 
transistors are expected to be Western Electric transistors, type GF-^5011, 

The basic circuits built from these transistors have operation 
times of from 5 to ho ° 10~9 second depending upon the circuit. 
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SUMMARY OF PROPOSED MACHINE'S SPECIFICATIONS 



This report covers many aspects of high-speed computing machines. 
As the study shows, more than one decision concerning machine organization or 
design is acceptable, and the "best decision concerning each question must re- 
main unanswered until more detailed plans are drawn up. In this section, a 
set of specifications is written down which serves to show in a concise way 
a definite machine based upon the best estimates now given in the text. 

The reliability of these estimates varies with different parts of 
the machine. Some parts may be greatly influenced by future developments and 
may be realized by techniques different from those now contemplated and, hence, 
will involve a number of active elements quite different from the amounts 
listed below. The operation-time estimates are based upon small-scale experi- 
mental studies. Although factors affecting these times are taken into account 
insofar as possible in the extrapolation to the large-scale prototype, it is 
necessary to recognize that unforeseen characteristics of the prototype may 
require revision of the time estimates. 

A tabular summary is presented where possible. 

V 

1. Organization 

In accordance with the historical division of computing machines 
the machine may be considered to have: an arithmetic unit, a memory, a 
control and an input -output unit or units. However, the rather long memory 
access time (l;5 M-s) compared to the short addition time (0.32 us) requires 
temporary storage facilities intimately connected with the memory and the 
arithmetic unit. These will be called fast -access registers or fast -access 
memory. The interconnections are shown in block diagram form. 

Reading from the memory is carried out at the same time the arith- 
metic unit is in operation. Thus, the memory may load order register 0^ and 
operand register X during the time the arithmetic unit is performing a multi- 
plication. Except for the brief period when the arithmetic unit is 

ix 
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receiving or sending data to the fast-access registers, the arithmetic unit 
operates separately from the memory reading and writing operations . A part 
of the control called the advanced control arranges to keep registers 0, , 0, 
X, Y, and Z filled with data which the arithmetic unit is expected to need. 
Combinations of orders such as multiply and add help reduce accesses to the 
main memory. 



CORE 
MEMORY 



READING 
AMPLIFIER 



WRITING- 
CIRCUIT 



OPERAND' REGISTER Z 



CONTROL, B-LINES 



ORDER REGISTER 



ORDER REGISTER ? 



OPERAND REGISTER X 



OPERAND REGISTER Y 



ARITHMETIC UNIT:A,Q,M *" 
• ■ ■ ■ - — |^ ■' Be>- 

FAST— ACCESS REGISTERS: R 2 ,-Rj 



Figure 1-. Block Diagram 

2. Fast-Access Memory 

Three operand registers (.X, Y and Z') Non-shifting, 52 bits each 

Two instruction registers (0^ and 0^) Non-shifting, 52 bits each 

Twelve B-registers or. counters Non-shifting, 13 bits each 

Word length 52 bits 

Each word is l) 52-bit fixed point number 

or 2) ^2-bit floating point number, 10-bit exponent 
or 3) four 13-bit control groups. 
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A" short instruction has 7 bits for function, h bits for local fast register 
addressing, and 2 bits which are used for function or address bits. 

A long instruction is a' short instruction plus a 13-bit address. 

Access-time, fast memory to arithmetic unit 0.05 us 

Number of orders outlined in text 100 

Equipment for Fast-Access Memory 

Number of transistors 2000 + 25/0 

Number of diodes 8000 + 25$ 

Number of resistors 63OO + .30$ 



3 . Arithmetic Unit 

The arithmetic unit is characterized by an adder with carry 
completion sensing, separate carry register so that carries are not assimilated 
except when necessary and a multiplication scheme which senses two digits and 
shifts two places for each partial step. Recoding of the multiplier digits 
reduces the number of uses . of the adder during multiplication to 52/3 = 17 on 
on the average . 

Numbers 

Number system 
Word length 
Addition • 

Number representation 

Electrical bit representation 
Multiplication 

Multiplication shifts 
Division 

Overflow detection 

xi 
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Binary, parallel 
52 bits 

Separate carry register, 
carry completion sensing 

2 's complement, fixed and 
floating point 

DC voltage 

Reversed ternary multiplier 
recoding 

Two at a time 

Conventional non-restoring with 
remainder (type l) or non- 
restoring with standardization 
of partial remainders (type 2) 

Available on all orders 



Registers 



Accumulator and quotient 
Number register (M) 
Operand registers and 

Fixed Point Operation Times without Access 

Addition without carry assimilation 
Addition with average carry assimilation 
Multiplication without assimilation 

Division, type 1 without assimilation 
Division, type 2 without assimilation 
Average carry assimilation 
Maximum carry assimilation 



Shifting registers, 52 bits each 
Shifting register, 52 bits 
Non-shifting, 52 bits each 



0.19 Us + 10% ■ ' 

0.32: us + 10% 
3-8 u.s' average, + 10$ 



^.8 [is maximum, + j-uyo 
10-20 u.s + 10% 
7- l^t- [is + 10% 
0.13 [is ± 10% 
O.67 (IS + 



Equipment Cost for Arithmetic Unit 

Number of transistors 
Number of diodes 
Number of resistors 



6,800 + 25% 
' 6,300 + 25$ 
11,500 + 30/o 



k. Control 

The machine has an arithmetic control which forms' instructions from 
0^, 0g, and the B- registers and properly sequences the arithmetic operations. 
The advanced control looks at instructions just ahead but not yet executed by 
the arithmetic unit and places additional data from the memory in 0-^., 0^, _ X 
or Y as necessary. Of course on conditional control transfer orders or other 
orders dependent upon calculations in process, the advanced control. must wait 
for its information and no advantage is gained by the fast registers. Advanced 
control places operands in X, Y from addresses contained in 0-^ or 0^ and then 
provides appropriate addresses to the arithmetic control to indicate that the 
operands are in X and.Y. 
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A memory control to properly sequence the reading 
from the memory and an input-output control are necessary. 



and writing 



Equipment for Controls 

Arithmetic control 
Advanced Control 
Memory control 
Input- output control 



No. Transistors 
1200;+ 30$ 
2U00 + 30/o 
1000 + kO<?o 
1000 + k£)$ 



No. Diodes 

Uooo > 30$ 

8000 + 30$ 

3500 + ko<f> 

3500 + h0$> 



No. Resistors 
5000 + 50$ 
10,000 + 50$ 
i^ooo + 60$ 
kOOO + .60$ 



Totals 5600 + 3^ 19,000 + 3k<f 23,000.+ 5^ 



5. Memory 

A core memory using the word- arrangement (or word addressing scheme) 
is favored because this produces the shortest access- times with the conventional 
cores known to be available. In this addressing scheme, there is a smaller 
drive current restriction than in the coincident current system and' hence the 
core turn-over time may be shortened. 



Characteristics 

Main internal random-access unit 
Random- access time 
Type of memory core 

Type of switch core 

Type of addressing 



8192 words, 52 bits each 
1.5 (IS 

General Ceramics Sh, 
size F-39U 

General Ceramics S5, 
size F-39k 

By words 



Equipment for Memory 

Number of large vacuum tubes 300 + 20$ 

Number of small vacuum tubes 1000 + .20$ 

Number of transistors 1000 + 20$ 

Number of cores (two per bit stored) 851,968 

Number of cores for switches 212,992 

Auxiliary memories are listed under input- output . 

xiii 



6. Basic Circuits 

The basic circuits use diffused-base transistors with provision 
to prevent the transistors from going into saturation at any time. This 
produces the shortest transistor switch time.-- The circuits are more complin 
cated than the simplest circuits and require a larger number of parts. The 
number of tolerance problems is necessarily greater, and these are being 
checked with the Illiac. Diodes, of a particularly fast type, are also used 
as switching elements. 



Circuits 

Switching elements 



Direct-coupled asynchronous 

Western Electric GF-U5OII 
transistors, Qutronics : 
Q5-250 or Q10-600 diodes 
or equivalent. 





No. of 

Transistors 


No, of 
Diodes 


No. of 
Resistors 


Operation 
mus 


Flip flop 


k 


11 


15 


30 


NOT 


2 


h 


5 


15 


AND 


2 





3 


5: 


OR 


2 


2 


3 


5 


Double gate 




1,2 


8 




Half -Adder 


8 


6 


Ik 


25 


Complement Circuit 


6 


2 


9 


10 



Although these circuits, and the logical developments giving rise 
to the operation times listed under the arithmetic unit, are direct-coupled, 
they are not under all conditions asynchronous. Chapter 5 is devoted to a 
discussion of this subject and a rigorous definition for asynchronism is 
presented. Although the more rigorous definition requires more care in 
application and more parts in most circuits, there are good prospects that 
it can be done and is worthwhile. Succeeding developments may' adopt this 
new definition whenever possible. 
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7- Input- Output 

For balance only the fastest possible types of input- output would . 
be appropriate. This includes only magnetic drums and the fastest magnetic 
tape units. These units are greatly affected by commercial developments and v . 
no specific items are listed/. 

Equipment for Off-Line Operation 

Punched card equipment 
Punched paper tape equipment 

Card and paper tape to magnetic tape converters 
Fast printer 

Equipment for On-Line Operation 

Magnetic tape units with 30 or more channels per tape 
Magnetic drum, 50>000 words 
Cathode ray tube display equipment 

Switching elements need not have an operation time Shorter than 1 \is . 
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PROBLEMS REQUIRING FASTER COMPUTERS 

1.1 Introduction 

Since the publication in 19^6 of the fundamental report by A. W. 
Burks, H. H. Goldstine and J. von Neumann^ a class of machines based 
directly on the ideas of this report have been built and put into operation. 
Other machines influenced in various degrees by this report have also been 
put into. use. The experience obtained with these machines enables one to 
begin to discern the limitations of present-day computers in carrying out 
presently- formulated computational problems in various areas of applied 
mathematics, the physical sciences, and engineering as well as those compu- 
tational problems arising in these and other fields whose solutions do not 
yet have a precise formulation in terms of arithmetic algorithms. Problems 
involving various types of partial or ordinary differential equations may 
be considered to be of the first class whereas problems involving a large 
degree of combinatorics and not necessarily arithmetic manipulation of various 
quantities may be said to belong to the second class. 

It is the purpose of this chapter to discuss some problems of each 
type referred to above and to attempt to characterize them in terms of the 
requirements they impose on the memory, arithmetic unit and control of a 
computer. We shall be mainly concerned with the size and speed the first 
two of these organs must have. 

In the analysis given below it will be assumed that for some problems, 
for example those which involve the solution of ordinary or partial differential 
equations, the time spent by present-day computers in obtaining the solution 
may be estimated by the formula: 

T = F Multiplication time * Number of Multiplications 

where F is a number between :one and ten, the multiplication time is the time 



1* "Preliminary Discussion of the Logical Design of an Electronic Computing 
Instrument," A. W. Burks, H. H. Goldstine, John von Neumann, Institute 
for Advanced Study (Princeton, N..J.) June 19^6. 
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necessary for the arithmetic unit to multiply two numbers (exclusive of the 
time required to obtain these numbers from the memory), and the number of 
multiplications is the total number occurring in the problem. 

Actually T, the time spent by a computer in solving a problem, is 
composed of four parts: 

(1) the input-output time: the time spent in transferring the 
problem into and out of the machine, 

(2) the instruction access time: the time spent in transferring 
instructions from the memory to the control, 

(3) the operand access time: the time spent in transferring 
operands and partial results from the memory to the arith- 
metic unit and the time spent in the reverse process, 

(k) the arithmetic time: the time spent by the arithmetic unit 
in doing useful and necessary arithmetic. 

For present-day computers these times are additive, although logically 
they need not be. Indeed, one possibility that many designers are exploring is 
the possibility of overlapping these times by building machines capable of per- 
forming a number of these tasks simultaneously, that is, in parallel. 

(2) 

As was pointed out by H. H. Goldstine, the formula for T given 
above follows from the assumption that on the average, each multiplication 
can be imbedded in a sequence of (A+l) instructions consisting of one multi- 
plication and A non-multiplicative instructions, each of which takes a time L 
to execute. The time to execute these instructions is then 

' t = :M .+ AL + (A+l) (a i + a Q ) = FM 

where M is the multiplication time, a^ and a Q are the instruction and operand 
access-times respectively. Let 

^ (A+l) (a + a Q ) 
F = 1 + "M + ' M • 



2. H. H. Goldstine: "Systematics of Automatic Electronic Computers," 
Proceedings of the Darmstadt Colloquium, October 1955- 
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We then have 

T = Nt = FMN 

where N is the number of multiplications. The estimates on the size of F are 
obtained by estimating the various quantities entering on the right-hand side 
of the equation for F. 

Thus for a machine such as Illiac, 
1 2 o 
TT = -08. 

M 

Hence if A =11 we have 

F^2.8 . 

.1.2 Hyperbolic and Parabolic Partial Differential Equations 

We begin the listing of problems which are beyond the capabilities 
of present-day computers with some' time-dependent problems in hydrodynamics. 
These problems involve, the solution of a system of hyperbolic partial differen- 
tial equations in which the independent variables are one-, two- or three- space 
variables and time.. If .the integration is done in terms of Lagrange coordinates, 
the number of dependent variables is 2m + 1 where m is the number of spatial 
dimensions, since the Eulerian coordinates must be computed, the velocity field 
must be determined, and one thermodynamic variable in addition to the density 
must be calculated. Each of these 2m + .1 independent quantities must be 
evaluated as functions of the time- 

The calculation can be arranged so that the value of any dependent 
variable at time t + At at certain values of the spatial coordinates is 
determined in terms of the values of some or all the dependent variables at 
time t. However, only the values of the dependent variables at time t at 
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spatial points in a region near the point of interest are needed in this 
calculation; that is, not all the data at time t is needed to calculate the 
values of a dependent variable at a single spatial point at the time t +At. 
This observation is important for the consideration of the memory require- 
ments of a computer. 

The size of the time-step (i.e. the size of At) must be related 
to the size of the spatial mesh. Thus in a one-dimensional problem in which 
we evaluate the dependent variable at mesh points 

X = X q + jAX (j an integer) 

we must have 

CAt* AX 

where C is the (variable) sound velocity. 

If the problem is such that one would expect spatial variations in 
the dependent variables similar to those in the function 

sin 2itkX, 

then one should have 

2jtkAX < ~- . 

It is not unreasonable to require that the extent of each spatial variable be 
divided into between 10 and 100 mesh points; that is, 

T5~ ^ AX ^ 'I55 * 
The amount of data which has to be stored at a given time is then 

30 < D < 7-10 6 , 

where the lower limit holds for a one-dimensional problem with a 10-point spatial 
mesh and the upper limit holds for a three-dimensional problem with a cubical 
mesh having 100 points to a side. 
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The number of multiplications involved in the calculation of the 
values of all the dependent variables at a mesh point depends on the number 
of dimensions . It may be assumed to be 30 for a one-dimensional problem, 
50 for a two-dimensional one and 70 f° r a three-dimensional one. Hence, N, 
the total number of multiplications per time- step, will be 

P 8 
9 • 10 < N < h.9 • 10 . 

The computation time in seconds to perform the required calculations for one 

-k 

time-step with an existing computer with a multiplication time of 5 * 10. 
seconds and F = 2.5 is then 

1.1 < T < 6 • 10 5 . 

If the computation is to be done for a number of time-steps M sufficient for 
signals to traverse the region under consideration K times, that is if 

CM At ^ Km Ax 

where m is the number of mesh points in each dimension, since C4t~j^ix, we 
would have 

M ^ Km. 

Hence for a single problem the computation time in seconds would be 

11K < t < 6K • 10 7 . 

— c — 

Quite often numerical surveys' are made in which calculations are done again 
and again, each time with various parameters assigned different values. 

It is clear that for moderate values of K the time required for a 
single calculation is excessive in the three-dimensional case. There are, 
of course, many one -dimensional problems in which the number of mesh points 
is considerably larger than 10 and K is of the order of 30. Thus even in a 
one-dimensional problem it is quite possible to have 

t :^? 25 hours 
c 

for a single problem with an existing machine by taking m » 170 and K = 30- 
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In two- and three-dimensional hydrodynamical problems done toy use 
of Lagrangian coordinates there arises a difficulty in the calculation due 
to the fact that "particles" of the fluid which, were originally neighbors 
do not remain neighbors. This implies that after some time-steps the cal- 
culation summarized above has to be interrupted and a new Lagrangian mesh 
has to be formed. If this process is to be done by the computer, the time 
estimates given above have to be increased. Moreover existing computers 
are not very efficient in performing combinatorial . manipulations required 
in setting up the new Lagrangian net. 

For parabolic differential equations the situation is much the same 
as that for hyperbolic ones with one very important difference. Instead, of 
having the time-step linearly related to the 4P a "tial mesh size, we have 

At^k(Ax) 2 . 

This introduces another power of the number of mesh points in the expression 
for the number of multiplications required to integrate the equations for a 
given interval of - time. 

In summary it may be said that there are many problems involving 

hyperbolic and parabolic partial differential equations in which the total 

7 11 

number of multiplications involved is between 10 and 10 and that in these 
problems 50,000 to 100,000 words of data are used. However the calculation 
may be so organized that relatively small amounts of this data are required 
at once. Therefore, it is not clear that a random access memory sufficient 
to hold 100,000 words of data is required. 



1.3 Elliptic Partial Differential Equations and Linear Equations 

The solution of elliptic partial differential equations may be reduced 
to solving a set of linear equations: 

Ax = b 

for the n components of x. where A is a given n x n matrix ;and the n components 
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of ' Id are given. When such a system of linear equations arises from a partial 
differential equation, the dimensionality of the problem is related to the 
number of mesh points used in approximating the differential operators by 
difference operators . Values of n = ^00 or greater are not uncommon. 

The matrices- A which arise in these problems usually have the 
property that the non-zero elements in any row of the matrix are relatively 
few in number. Moreover the location of these non-zero elements relative 
to the diagonal element is the same for each row. However the values of 
these elements will not be the same unless the coefficients of the differential 
equation are constants. For non-linear differential equations, these 
coefficients are not the same on successive iterations. 

Because of these properties of the matrix A, the usual methods of 
solving the linear equations involve iteration processes. If we assume that 
20 iterations of a matrix are required and that there are 10 non-zero elements 
in a given row of the h-00 x kOO matrix we find that there are 80,000 multipli- 
cations required to find the solution of the problem once the coefficients of 
the matrix are known. In some problems these quantities . may depend oh their 
position in the matrix and on the approximate values of the x's. Therefore on 
each iteration the coefficients of the matrix A have to be evaluated. If the 
evaluation of each non-zero term involves 10 multiplications, we must perform 
kOO • 10 • 10 =. 40,000 multiplications to evaluate the matrix on each iteration 
and hence h.k ° 10^ multiplications per iteration or 8.8 • 10 ^ multiplications 
per 20 iterations . 

Hence for a non-linear elliptic problem the number of multiplications 
is comparable to the number involved in hyperbolic and parabolic differential 
equations. However the amount of ' data that need be stored and manipulated may 
be much smaller than in the latter cases . 

l.k Sorting and Meshing Problems 

In scientific computations and in other problems one is quite often 
confronted with the need of comparing sets of quantities represented by numbers 
which are either stored in the machine or on some medium which may be read by 
the machine. One way of handling such problems is to order the data in some 
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fashion. Because this problem is very common it is worthwhile discussing it 
to see how time-consuming it can become and what factors affect the time 
involved in completing it. 

Von Neumann and Goldstine have described a method for reducing a 

set of numbers (or a set of complexes of numbers) to a set ordered by size. 

The method consists of an application of a meshing process : two sets of 

ordered numbers are meshed as follows: let the first set of numbers be 

X n , .... X and the other set be Y v , .... Y . Then we form a single set 
l 7 ' n 1 m 



where 

Z * =X i orY j 

where X^^ is the first of the remaining X's . 

Y. is the first of the remaining Y 1 s , and k = i + j - 1. 
The' choice as to whether an X or Y is chosen depends on whether 

X. > Y. 

i - J 

or 

X. < Y. . 

1 J 

Sorting- is accomplished by successive meshing. Thus the numbers 
X^, X R .are first meshed, as if they were groups consisting of single 

elements, into n/2 groups of two elements: X^, X^j X^, X^; X^ ^, X^. 

These are then meshed into n/h groups of four elements. This process 
continues until there are n/S* groups of 2 elements each. ■ If n is a. power 
of 2, n = 2 m the process stops when k = m. 

The arithmetic being done in this problem consists of approximately 
n log 2 n 

comparisons., for on every meshing operation there is a comparison for the 
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determination of each element of the new set. However, the time for complet- 
ing the soUting cannot be estimated accurately by multiplying this quantity 
by the time for doing the arithmetical comparison. One must take account of 
the time required for "red tape" operations (the time required to locate and 
acquire the proper numbers for the comparison), and the time. to determine 
where the winner of the comparison test is to be placed, and to store this 
number in the memory. 

Thus, the time to complete the sorting will be of the form 

Kn log 2 n 

where K is a time usually much greater than the time to perform a simple 
addition in a machine. 

It is expected that in quite modest problems n £5s 1000 'Tz. 2^~® and, 

k 

hence, the time to sort is Kn. log^n^-IC • 10 . As a rough estimate, we may 
take K to be 10 times the addition time of the computer and, hence, 
K .= .k • 10 ^ seconds for a machine such as Illiac. ThUs, the sorting of 
1000 pieces of data would take h seconds. Although this time itself may 
appear small, it may have to be repeated a very large number of times and 

may influence greatly the feasibility of doing a particular problem on a 

6 2 
computer. If n s^slO , the time estimate would be multiplied by 2 °10 and 

would require 8,000 seconds or 2 hours to carry out the task. 
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CHAPTER 2 



REQUIREMENTS ON A VERY FAST COMPUTER 

2.1 Memory Capacity Requirements 

In the discussion of problems described in sections 1.2 and! 1*3 we 
have seen that the total number of multiplication's, .may vary, between 10 • and • 
10 11 and that in some of these problems 50,000- oft- -100,000 words of data 
are used. However, not all of this data is needed at any one stage of the 
computation and therefore need not necessarily be stored .entirely,!© the; 1 
high-speed random-access memory. Indeed it is the purpose of this section to 
show that the use of a lower speed non-random-access memory (a "backup" memory) 
in conjunction with a faster but smaller memory does not involve a great per- 
centage increase of computing time for those problems satisfying the following 
property: 

The problem is such that if N words of data are in the high-speed memory 
we may then calculate N-k new words which replace the same number of words pre- 
viously held. Thus in one sweep through the memory N-k words are calculated and 
these may be stored in positions previously occupied by other words. If N-k is as 
large as required by the problem, this process is repeated many times and the 
total time for the calculation is 

(N-k) • n • F • M • T = T^T 

where 

n = the number of multiplications for each of the (N-k) words 

T = the number of times the memory is swept through, 

If, however, N-k is not as large as required we ntust add to this the 
time necessary to load and unload the memory. Let us say that the time necessary 

to unload the high-speed memory is made up of two parts: (a) a time to get 

access to another memory, <<,and (b) a time to read N-k words into this memory, 
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Thus unloading time is 

^ + (N-k)^ 7 . 

Assuming that reading and writing in the latter memory take equal times and that 
we have to read N words from it in order to properly load the high-speed memory, 
we have as our loading time 

Hence the total loading and unloading time is 

2(^ + (N-k)^) +. k/f. 
The ratio of this time to the computation time per load in the memory 

is 

2(<* + (N-k)^) + k^ 
(N-k)nFM 

• 7ml u - k ■ - * ^/ • 

The total computation time is now 

(1 + R)T^T . 

Hence the ratio R is a measure of the cost of using a pair of memories instead 
of a single memory. We may write 

k 



Then ^ 
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In this formula n, N, and k depend on the problem being solved and 
e<^ and/^ depend on the equipment being used. For a given problem we may 
estimate the size of high-speed memory N so that R is the order of 10$. The 
important thing to note is that the rate of change of R with N is a slowly- 
varying function beyond the value of 10$. 

Goldstine^ has shown that for a problem in hydrodynamics with time and 
two spatial variables for which 50,000 words of storage are needed, a core 
memory with room for 1750 words of data requires less than 10$ of computing time 
for consulting the secondary memory when 

S< - L - 1700 ox- less 

= .8 or less 

k = 500 
n = 20 
F = 2.5 

However, Goldstine did not consider the complications ensuing when 
slip-streams occur in the computations. 

Suppose now we examine the question as to whether a random-access 
memory of 30,000-word capacity is needed. 

In two- and three-dimensional hydrodynamics problems in which the 
quantities are time-dependent, the amount of data which has to be processed in 
one time-step will exceed this capacity by a factor which is at least between 
one and two and may be five. Therefore a backup memory will be needed in any 
case. We are thus dealing with the factor R and the question is what is the 
value of R for N 30,000 and say N ^= 2,000. 



1. H. H. Goldstine: "Systematics of Automatic Electronic Computers," 
Proceedings of the Darmstadt Colloquium, October, 1955. 



-12- 

574 028 



Suppose we have a drum in which 



c< = 17000 ^ s = access-time to a wordy 

^ = 8 /is = read-time for a word. 

If then- the multiplication time is assumed to be 5 ^s, 

<A ± = 3^00. 

^1,6 

+ .8 



R= % 1.6 + •* 



For k = 500 



nF \ ' K , 

k 



^ 500 y 

R 3 oooo = If - '° 68 
R 2 ooo = W = M 

where |^ = .04, (f = 2,5, n- = 20 ) . 

Hence the ratio of the times to do the same problem with a memory of 
32^768 words and one with 4768 words (2768 words of code being used) is 

lTo? ~ 7 ' 

A 7% increase in time is gained by an almost eight-fold increase in size of 
memory . 
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For k = iOOO we have 



V looo / 

E 3 oooo = ff t 1 - 6 + ^r> = i? - w 

R 2000 = 1? ^ 6 + M) =|= (5-8) = ,23 * 
Here then we have a similar' phenomenon. The ratio in times is 

1.07. " 5 * 

a 15 speed-up being paid for by an almost eight-fold increase in size of memory. 
The doubling of the factor k, the amount of dead-weight data being carried in 
the transfers, is responsible for the change in time. 



2 .2 The Class :of Problems that can Use- Backup Memories 

It is clear from the discussion of parabolic and hyperbolic differential 
equations that these can be formulated so that they :satisfy the assumption made 
in the previous section . That is* if N words of data are in the high-speed memory 
we may then calculate N-k hew words which replace the same number of words pre- 
viously held . Hence a backup memory can be used to good advantage on such problems . 

It may also be used on algebraic problems arising in the solution of 
elliptic partial -differential equations or integral equations . Such problems 
involve the solution .of linear algebraic equations of the form 

Ax - b = (2.1) 

Ax Xk = 0. (2,2) 
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Xhese problems nay either be solved by iterative methods or by digest 
methods. The former are useful for large matrices in which many elements vanish. 
In such cases the (nxn) matrix A which may have n elements in reality is repre- 
sented by many fewer elements and may be stored in a relatively modest high-speed 
memory. Thus for such cases where the matrix has relatively few non-zero elements 
and where an iterative method is used, that is the matrix A is not destroyed in 
the process of solution, there is no apparent need for a very high-capacity high- 
speed random-access memory. 

Even when direct methods are used in solving linear equations such as 
equation (2.1) the price paid for using a modest-capacity low-access-time memory 
and a backup store in contrast to a high-capacity low-access time memory need 
not be excessive. This statement is justified as follows? 

The direct method for solving equations (2„1) is the elimination method. 
Let us suppose that the matrix we are dealing with has n rows and n columns. It 
is to be reduced to a triangular matrix with the elements below the main diagonal 
zero by multiplying the rows by suitable numbers arid subtracting them from subse- 
quent ones. 

We denote by p the capacity of the high-speed memory (for numerical 
storage) and write 

n = pm 

* 

where we assume that m is an integer. 

We propose to deal with m rows of the matrix A simultaneously. The 
first m rows of the (n+l) x n augmented matrix are read into the machine and as 
much elimination as can be done is done. The reduced data is then written on 
the drum. The next m rows of the original matrix are written into the machine. 
The first m rows are called back from the drum, used to reduce the second m rows, 
these partially reduced rows are further reduced into their final form and then 
written onto the drum. The third set of m rows is then called in and similar 
processes are applied. This continues until the whole matrix has been triangu- 
larized. 
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In this process the triangularized augmented matrix is written on 
the drum, .that is 

2 ' 

numbers are written onto the drum and 

R = (p-1) § (2n - m - 3) + (p-2) | (2n + 3 
numbers are read from the drum where 

n' 

p = - . 

m 

The time taken to perform W arid R is the extra cost in time paid for by not 
having enough capacity in the high-speed memory. 



- 3m) + • • • ^ 



2n + 3 



- (2p-3)m] 



Now 



p-1 



2 p-1 



R - § (2n+3) r (p-r) - f- ST (P-r)(2r-l) = f (2n + 3) 2 



P(P-l) 



r=l 



r=l 



m- p(p-l)(2p-l) 
2 6 



R = |~2n + 3 - 

4m L 3 J 



n(n-m) A 9+mv n_ 
" 4m ^ 3 ; 3m * 



We may neglect W in comparison to R. 



The number of multiplications involved in the triangularization of a 
matrix is to the same approximation 



3 * 
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. ... Hence the ratio of the reading from the drum. time to the computation 
time |a 

- /* 

mF i 1 

where we have assumed that we read in such large. batches from the drum that 
the factor involving & may be neglected » 

If m is large enough this quantity may be made small so that the cost 
of not having a sufficiently high-capacity memory is not excessive even in this 
problem . 

Even for sorting problems the use of a relatively small high-speed 
memory and a slower backup memory is not too costly in time. Let us suppose 
that we have a machine and a fast-access memory which is capable of sorting Nq 
numbers . Let us see how this may be used to sort N numbers which are originally 
stored on a drum. Let us write 

N = rN Q 

and assume that r is an integer <, 

We may sort the N numbers into r sets of Nq numbers and involved in 
this is a reading of N numbers from the drum and a writing of these. This 
reading and writing can be done in blocks of Nq numbers . The question arises 
as to what to do nexto Now mesh the following numbers; A set of size K from 
the first Nq> a set of the same size from second Nq.. . . . where 

rK = N . 

In the process of sorting these numbers into Nq ordered numbers, one set of K 

numbers will be exhausted., .Print out the first K sorted numbers and fill the 

space previously occupied by the exhausted set of K numbers with the next K 

numbers from the exhausted set. Thus the extra time involved in having a 
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backup store is the time required for two readings and writings of the data. 
The percentage increase in time is 

2RN + 2WN = 2(R+W) 

KN log 2 N K log 2 N ' 



2.3 Speed Requirements on the Memory 

In section 1.1 it was stated that the quantity 

F = 1 ♦ A | + (A+l) ^ + (A+l) g2 

was one of the quantities determining the time for completing a computation and 
it was shown that for a machine such as Illiac this quantity was about 2.8 when 
A = 11. 

With the present progress in circuitry it seems possible to design a 
computer with 

M = 4 us 
L = .3 us 
a Q = 1.5 us. 

a- = «75us (when orders are stored in pairs). 

Hence 

5 = - 075 

— = .375 

a.. 
M 
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The corresponding figures for the Illiac are .08, .05 and .025 
respectively. Hence we see that a machine with the same logical design as 
Illiac but using the latest circuit techniques would have a factor of 

F = 1.56 + A(.64) = 8.6 

for A = 11. The corresponding figure for Illiac is 

F = 1.08 + A(.16) = 2.8, 

that is, for the new machine the factor F would be three times larger. Since 

the time to do a computation is measured by F times the multiplication time the 

increase in the factor F vitiates in part the speed-up in multiplication time 

that can be achieved. If the multiplication time is decreased by a factor of 

200, the total time to do a problem is decreased by a factor of only 67. This 

. aQ a^ 

is because the terms and are so large for the present machine. 

If F is to be the same for a machine with a multiply time of ly^s as 
for Illiac and if the logic of the two machines were to be the same we would 
have to have a word access-time from the memory of about ,15^/ts, that is, ten 
times faster than seems possible with present techniques involving core memories. 

It is the purpose of the next chapter to discuss methods of reorganizing 
the computer and adding extra equipment so that a reasonable fraction of this 
factor of three in loss of speed is recaptured. The computers there proposed 
manage to recoup this factor on some problems. They are no slower than sequential 
machines on other problems and therefore at the worst would be at least sixty 
times as fast as Illiac. 

2.4 Numerical Requirements on Word Length 

The word length for a given machine is influenced by arithmetic and 
organizational considerations. Mainly because of the former it is proposed to 
use a word length of 52 bits. There are two major arguments for this based on 
fixed point and floating point arithmetic respectively .• 

If a floating point number is represented as an exponent and fractional 
part packed into one word, then the word length should be sufficient to maintain 
accuracy in floating point arithmetic. A 42-bit fractional part seems to be 
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just adequate for this purpose. It is therefore proposed to use 10 bits as an 
exponent for floating point numbers and 42 bits as the fractional part. We 
shall discuss floating point arithmetic further in subsequent sections of this 
chapter. 

Even with fixed point arithmetic there is an argument for increasing 
the word length for the representation of numbers over that used in computers 
such as Illiac. In essence this is the following: With faster computers 
larger problems are done in which more computations are carried out and in 
which round-off plays a larger role. Hence more guard digits are needed. An 
estimate of how many additional digits are needed may be obtained as follows: 

Von Neumann and Goldstine^^ have shown that in inverting a matrix 
the limiting value of n, the size of the matrix, for which the results are 
accurate, is related to the number base^ and the number of places carried, s, 
by the inequality 

n < .15^ k . 

Further the amount of time needed to invert a matrix is proportional 

3 

to the number of multiplications, that is, n . 

Hence for a machine about 128 times faster than Illiac, we should be 

7/3 

able to invert a matrix of size 2 n in the same time as that with which 
Illiac deals with an n^* 1 order matrix,, If the results for this larger matrix 
are to be as accurate as those obtained from the Illiac on the smaller matrix 
we must have the number of bits carried on the fast machine, s' ,' given by 

s« =s + 4 50. 

The problem seems to be the most stringent algebraic one for deter- 
mining the number of bits needed. 

2 C H. H. Goldstine and J. von Neumann, "Numerical Inverting of Matrices of 
High Order," Bulletin American Mathematical Society, vol. 53 > no. 11, 
pp. 1021-1099, November 1947. 
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2 . 5 Floating Point Arithmetic 

Because many problems of the complexity and magnitude requiring a 

computer of the type discussed in this report are difficult to scale without 

some preliminary calculations, it seems worthwhile to have automatic facilities 

for floating point operation incorporated into the computer. However, as has 

(3) 

been pointed out by Metropolis it is not clear that the usual methods for 
performing floating point operations are the best ones that can be used. 
Numerical' experiments are being planned at the Digital Computer Laboratory to 
help determine a reasonable set of rules for the automatic floating point 
operations to be incorporated into the computer. 

The ' difficulty in using floating point arithmetic arises because the 
usual automatic floating point operations give no indication of relative error 
of the fractional part of the machine representation of numbers and indeed do 
not preserve relative errors in a computation. 

Real numbers may be written as 

x 




where x g is the exponent of the number and x^ is the fractional part. A com- 
puter represents the number by 

x 

o e - 
x = 2 x f 

where 

x f = x f+ 6 x . 

£ y/'X-f ^ £ x/ x f ^ S then the relative error in the fractional part of the number and 

_ x 

x = 2 e (x f + 6 x ). 

3. H. Metropolis: "Floating Point Arithmetic," to appear in IRE Transactions 
on Electronic Computers. 
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The quantities x g and x^ are not uniquely determined by x. For fixed 
point operation the computer user is required to restrict x so that 



-1 <_ x < 1, 



and then 



x 



= 0. 



e 



This implies that a problem is scaled before the computer is used. In some 
cases this is done incorrectly in which case overflow occurs in the computation. 
Overflow may occur in accumulating a sum or in an improper division (one in 
which the denominator is less than the numerator) . 

To avoid scaling difficulties floating point operations are used. In 
these the restriction is made that 



The problem in doing arithmetic is to keep track of the size of & 
as compared to the size of x^. The 6 accrue during arithmetic operations and 
their determination is complicated. The initial error made by inserting 



s is the number of binary places to which numbers are stored (or one more). 




which fixes x . 

e 



the fractional part of a number in a machine is bounded by 6 = 2 s where 



Then 



x 



and 




Se 



when we may neglect 
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In any computation difficulties ensue when x^ and£ x are of the same 
order of magnitude. In fixed point- work this can be sensed from the magnitude 
of x^, that is, a computation in which too small values of x^k<^ where k 
is a "small" integer, occur, the x^ have to be treated with great care. In 
floating point the ability to sense troublesome x^ is lost, since all x^ are 
made to lie between 1/2 and 1. 

That x^ and £^ may become of the same order of magnitude is clear 

from the subtraction process. Suppose we have two standardized numbers x 

and y with x > y then one could form the difference between these as 
J e — J e 



x -p 
d = x- y = 2 e 



y -x 

2^(x -:2 y ) 



+ &r 

d 



where 



e A = 2 P (6 - 2 e ~* e € ) + 6 . , 

d K x y 

and £ ! is a rounding error. Now it is quite possible that although x^ and y^ 

y -x 

/ e e \ 

are large compared to f and £ respectively the quantity (x^ - 2 y^J is 

of the order of € . The exponent p occurring in the above measures the number 

(3) 

of "significant" zeros in the difference. Metropolis suggests storing p as 
zeros in front of the first non-zero digits of d^, that is, he proposes not to 
automatically standardize numbers of the arithmetic operations. 

Then rules for performing arithmetic have to be provided. A pro- 
visional incomplete set of rules^^ is: 

Addition and Subtraction: Position number with smaller exponent by shifting it 
to the right. Carry and addition or subtraction. If overflow occurs shift one 
place right and adjust exponent. Never shift to the left and standardize the 
result. 

3. N. Metropolis: "Floating Point Arithmetic", to appear in I.R.E. Trans, 
on Electronic Computers. 

4. These rules are based on those given by Metropolis in (3). They differ from 
his in the multiplication process as is discussed in the text. The Digital 
Computer Laboratory is very grateful to Dr. N. Metropolis for supplying a 
preprint of his paper to that Laboratory. 
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Multiplication : Given 



- _ e - 
x = 2 x. 



7 = 2 6 y. 



and also x^ >_ y^. Form 



x -p+y 
xy = 2 (2 x f y f ) R 



* 2 



V y e" P 



x +y -p 

, e ''e r /^P 



(2 P x f y f + 2 p 6 x y f + 2 P £ x f + 



where p is determined so that 



5 < 2 P x. < 1, 



the relative error is 



2? f x y f + 2 " 6 y *f + 
2 P x f y f 



^x + 2P *y *f + 6 ' 
2 P x f y f 



In order that the round-off process will not contribute appreciably to the 
error in the product on a single multiplication, we require 



2 p € 5L 

— 2 — - > — 

2 P x f y f 



2 P x f y f 



5T4 C * 
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This requirement states that the dominant error in the product be larger than 
the round-off error, that is, the size of the relative error in the product 
will be that in the factors . Now 2 P x^ is made to lie between l/2 and 1 by 
the definition of p. Hence 

• ; 7T 6 < 2 P x„ £ < £ o 

2 y - f c y - c y 

Thus if f is of the order of this criterion will be violated. Note that 

7 ^ ■ y 

will become of the order of after a sequence of multiplications of this 
sort since it gets decreased in each multiplication . 

- ... • Muller has suggested that a product be formed as • . . 

x +y -p-1 r ' -l 
.. ^y = 2 e . e . [2(2 p x f 7 f )] • 

This has the effect of increasing p by 1 and now achieves the desired result 
since it replaces the above inequality by 

6 < 2 P+1 3L 6 < 2 € , 

....... 7 - . f y - 7 

but the relative error is unaltered „ 

Division : The rules for this operation given by Metropolis for forming x/y 
are: 

(1) Shift x to the right if necessary until x| < |y | . 

(2) Standardize y. . . . 

(3) Form the rounded quotient of the resulting numbers,, 
These rules do not have the property that 

and involve both right and left shifts- For these reasons numerical experiments 
are being planned to see if a better set of rules for division can be evolved. 
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CHAPTER 3 

PROPOSED ORGANIZATION OF THE COMPUTER 



3 . 1 Introduction 

The present state of computer components and computer circuits is 
such that it seems possible to build an arithmetic unit with a multiplication 
time^^.M == 3-5 M-s and an addition time L - ; .35 M-s . The description of an' 
arithmetic unit with these characteristic times will be found in Chapter 8. 
These times are between 100 and 200 times faster than the corresponding times 
of current computers such as Illiac. 

However, corresponding progress in Speeding up. the meiuury d.cct;ss= time 
has not been made. A large-capacity random-access memory will probably use 
magnetic cores and if these are .put together in the form of a "word- arrangement" 
memory, (cf. Chapter 6) the access-time per word would be a = ,1.5 H-s. This 
seems to be the shortest access-time possible for a core memory using presently 
available cores . This access-time is about 18 times as fast as the access- 
time of Illiac 's memory. Thus progress in memories lags by a factor of 
between five and ten behind progress in arithmetic and control units . 

For this reason the present-day machine designer must reconsider the 
organization of the computer and see if it is possible to mitigate the effects 
of the unbalance between the speeds of the arithmetic unit and the memory. It 
is the purpose of this chapter to discuss various methods by which this can be - 
done. and to propose machine organizations which incorporate these methods . 

(2) 

■3i2 A Design Criterion v 

The designer may have to choose one from many alternative designs 
each with a different number of. switching elements and a different speed of 

1. The best present estimates of the average multiplication time places it 
between 3«5 M-s and h \is . This chapter is based on the lower figure. 

Its conclusion would not be materially altered if the higher figure were 
used. 

2. It is proposed to apply this criterion in later work. 
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computation. The latter quantity may depend on the problem posed to the 
machine. If for a given class of problems the speed of computation can be 
estimated sufficiently accurately then the criterion discussed below may be 
used to choose between various designs. It should be noted that increased 
complexity may produce a slowing down of each individual element because 
the speed of an element is usually proportional to the number of elements 
to which it is connected. ' . 

For. a computer with n equally reliable switching elements with an 
average life of a hours each, a breakdown will occur once every a/n. hours, 
on the average. The computing time lost per breakdown, called F(n) hours, 
consists of the time to find and correct the faulty component, and the time 
to repeat that portion of the calculation which was wrong. The fraction of 
time during which the computer does useful calculation is 

a 



n 



oc / \ ~ nF(n) 

+ F(n) 1 + — -e— - 

n v a 

A problem which requires T hours of faultless computer time would require, 
on the average, T : ^l + nF ^, n ^ hours to be solved. 

Suppose that the same problem required T 1 faultless hours on a 
computer with n' switching elements. .Which computer is more efficient? 
For the same equipment, one could construct n'/n computers of the first sort, 
which would solve the problem in —^7 T hours of faultless calculation. There- 
fore the computer with n 1 elements is more efficient provided that 

T' (± + -i- n'F(n' )) < t(i nF(n)J 

or provided n'T'Yl + — ^- n f F(n' )) < nT^l + nF(n)J . Therefore the most 
v a. y \ a j , 1 x 

efficient computer is one for which the function nTIl + nF(njl is a 
minimum. 

That" is, we must have 



! + -JL-(?F(n) + nf^) 
a \ K dn/ 



dT _ a V dn/ dn 

T , n V n 
1 + F(n) 
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The experience of the Digital Computer . Laboratory with Illiac and with the 
circuits contemplated for the new computer suggests that the quantity- 



dp* 

nF(n)/a will be of the order of .05. If we assume that n — can be neglected 
with respect to F.we have 



1 + 2 -^-F( n) 

dT u .dn 

1 + Fin) 

a v ' 



Thus a Vf> increase in n should yield a 1.05$ increase in speed to justify 
itself. If 

dF „ 
U d^- F > 

that is, F;«^on, where ^ is a constant, we have 

dn /, 2n, -,, \ \ dn 

( 1 + — ' F(n) ) — , 

n \ a s 'J n ' 



W 1 + 3^-F(n) 



and setting nF(n)/a =..05 gives 



T r" 1,1 n 



3.3 The Speed of a Computer 

The speed of a computer, the reciprocal of the quantity T discussed 
above,, is a very difficult quantity to estimate for it depends on the problem 
being solved, the mathematical methods used, and the sequence of instructions 
given the machine as well as on the various properties of the machine. More- 
over, the mathematical methods employed and the sequence of instructions given 
to the computer depend on the properties of the machine. For example, an 
integration of a partial differential equation may be done on one machine using 
a coarse mesh and high-order integration formulas and on another machine with a 
fine mesh and a low-order integration formula because the different methods are 
best suited to different machines . 
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It was pointed out in Section 1.1 that for present-day computers, 
T, the time spent by a computer in solving a problem for which the instructions 
and operands are in the high-speed random-access memory of the computer, may be 
written as a sum of two terms : 

T = ■+ S 

where <^is the arithmetic time, the time spent . by the arithmetic unit in doing 
useful and necessary arithmetic, and S is the sum of the time spent in transferring 
instructions from the memory to the control and the time spent in transferring 
operands and partial results to and from the arithmetic unit and the random- 
access internal memory. 

■If a computer can perform arithmetic and can consult the memory 
simultaneously, T satisfies the inequality 

Max (a,S) < T < #...+ , S . 

The upper limit in this inequality holds for a sequential computer and the 
lower limit holds for a computer ( and a problem) such that every memory 
access is perfectly overlapped with arithmetic, or conversely. 

For those problems for which we may write (cf . Section l.l) 

T .= NFM, 

where N is the number of multiplications, M is the multiplication time and 

L (A + l)(a i + a Q ) 
F = 1 + A — + — — . 

.'We assume that the instructions are stored by pairs and use the 
times given in Section 3.1 as those pertaining to the computer under con- 
sideration. 

It then follows that 

-W = Nd + A-|-) - N(l + -^) 

N(A + .1) (a + .a ) | N(A + l) a 

_JL = ; 1 2_ = £ _ °..= .65N(A + 1 . 

M M M 
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Then we have 

1.18 < -4" < 6 .5 

- Or ~ 

where the lower limit obtains for A = .1 and the upper holds for A infinite . 
For 

A = 5 S/^= 2.6 

A = 8 S/a= 3-25 

A = 10 S/d= 3-58 • 

For a computer such as Illiac a/M is small (approximately .0U). 
The ratio of S/^is correspondingly small and the access-time is not as 
important a consideration in the design of a computer as it is when present- 
day techniques are used. 

The main problem confronting the computer designer is to decrease 
the effect of S. This can he accomplished in general by reducing the number 
of accesses to the main memory and by attempting to overlap the time that 
is used in referring to the memory with the time that is used in doing useful 
arithmetic . 

For problems in which 

T •= NFM 

this means that the number A is to be decreased and that the machine is to be 
organized so that the equation for F is to be replaced by an expression such 
as 



F .= Max (l +.A/10, .65(A + 1)J 



3.U Red Tape 



A large portion of the instructions in any program deals with counting 
of iterations, constructing and modifying other instructions which are to be 
used subsequently and in general determines what; is' to be done next rather 
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than perform arithmetic. This type of work is commonly referred to as red 
tape work. 

The control unit of a computer can be provided with an adder and 
registers called B-lines (or index registers or modifying registers) so 
that the majority of. the red tape work could be done outside the arithmetic 
unit and concurrent with its operation. These facilities would produce a 
substantial decrease in the quantities S and<£?. For those problems for which 
we may write S and <^2. in terms of the number of instructions A as above, the 
effect of doing red tape work outside the arithmetic unit is to decrease the 
quantity A by a factor of about 2. Thus for a problem in which 
for a- machine such. as Illiac, the addition of such a control will reduce 

s/tz to 2 .6. 

3 . 5 Storage of Addresses 

Addresses to the memory are needed by instructions when the instructions 
are executed. In many computers these are stored with the instructions in the 
memory. It was pointed out in the previous section that these addresses can 
be supplied to the instructions in modified form at the time they are needed, 
that is, when the instruction is being executed by the control. 

This suggests that space for. storing these constructed addresses 
not be provided in the main memory. This space can then be used for more 
efficient storage of instructions. 

The computer design described below makes use of this feature and 
as a result is able to keep the arithmetic unit usefully occupied for a 
longer time per memory access for instructions than it otherwise would. 

The number of memory words of instructions needed for a given problem 
is thereby reduced, and as a result the quantity S is reduced. 

3.6 Storage of Intermediate Results 

The numerical operands are of two sorts: initial or final numbers, 
and intermediate results. The distinction between these two classes may be 
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illustrated by the following example. Compute 



n 



£1 a,b. 

•i 1 1 

i=l 



Writing the sum recursively as 



S 



-= 



o 



S. = 



a.b. + S 



i-1 



l 



l l 



then the sequence of numbers (i = 1,..., n-l) are intermediate results. 
S Q and the 2n numbers a.^ and b i are initial essential numbers and the number 
S n is the final result. In this example the minimum number of memory accesses 
for numbers would then be 2n + 1 if the result was to be stored in the memory. 

If there, were some extra storage space outside the main memory (for 
this example carried out in single precision a single register would pre- 
sumably do) the number of memory accesses involving intermediate results could 
be reduced substantially. The design described below uses a number of such 
registers . 

3-7 Access for Instructions and Operands 



computer provided with B-registers is quite distinct from that of words 
representing operands. Instructions normally come from consecutive memory 
locations and go into the control. They are seldom written into the memory. 
Most instructions obeyed by the control are situated in one or more inner 
loops which are obeyed repetitively a large number of times. However, the 
number of instructions in an inner loop is highly variable. For example in 
problems involving linear algebra such as matrix multiplication inner loops 
are very short and could be replaced by one special-purpose instruction 
each; in partial differential equations inner loops can consist of 50, 100, 
200 or more instructions . 



The history of words in the memory representing instructions in a 
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The designer may take advantage of these facts to decrease the 
harmful effects of the access-time for instructions and for operands. If 
very fast- access storage is provided outside the memory, this may "be used 
to hold the instruction "being executed and other instructions. This memory 
may "be filled by using multiple readouts of instructions in one read-time 
from the 1..5 M-s memory. In that way the access-time per instruction may "be 
decreased. If this storage can hold ah entire inner loop which is to "be 
executed n times, multiple readouts of instructions do not influence the 
time considerations very much since each instruction is used n times per 
readout. 

Since inner loops' are so variable in size and since the acquisition 
of a moderately large -capacity very high-speed memory seems difficult to 
achieve at present, because of technical reasons-, it seems advisable to 
examine other methods for taking advantage of the behavior of instructions 
described above. 

If a few registers were provided outside the 1.5 M-s memory these 
could be used to store the current instruction and some following ones. 
Moreover, one could attempt to fill them from the memory while the arith- 
metic unit was engaged in multiplication or division, that is, while it 
■ could not be calling for new operands . 

Further, the information contained in these registers and B-registers 
would be available to the control which could be consulting the memory for 
operands needed in the • execution of subsequent instructions while the arith- 
metic unit operated. Storage for these operands would have.to.be provided. 

3.8- Summary of Useful Additional Storage 

In order to decrease the effects of the term S in the expression 
for the time of a computation, we have seen uses for the following storage 
facilities in addition to those provided on a machine such as Illiac. 
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I - B-registers for. address modification 

II - Very fast-access memory for 

a. storage of intermediate results,, 

t>. storage of instructions which may he obtained 

while the arithmetic unit is occupied, 
c. storage of operands, needed later, which may 

be obtained while the arithmetic unit is 

occupied. 

The capacities of the memories which could be used effectively 
vary from problem to problem. Hence, if one single memory could be provided 
for all three purposes and parts of it used interchangeably, this would have 
many advantages over fixing the amount of storage once and for all for each 
of the three purposes listed.. Another approach would be to provide a minimum 
amount of equipment for each of these purposes. 

■ Both approaches have been studied here. The latter is described 
in the subsequent sections of this chapter and a version of the former in an 
appendix to this chapter. 



3.9 Size of the Core Memory 

For many problems > the speed of a computer depends , in a rather 
complicated way, on the speed of its main random-access memory. A large 
core memory lends itself to more complicated numerical methods and table 
look-up procedures > whose use can result in a faster program. Furthermore, 
for problems whose temporary storage requirements exceed the capacity of 
the core memory, data must be held on the drum or on magnetic tapes, and 
be sent to or from the core memory in blocks. Unless the core memory is 
large, an inordinate amount of time may be consumed in transferring data 
to and from these auxiliary memories. The data transfer problem will be 
described for the drum. For magnetic tapes similar considerations would 
apply. 

From- the discussion given in. Section 2.1 it is evident that the 
percentage decrease in total time to do: a problem on a computer with a 
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drum backup memory "becomes smaller as the. capacity of the main random- 
access memory increases "beyond a critical point. The reason for this 
fact is connected with the distribution of time spent in reading and 
writing on a drum. Suppose that the average random-access time (the time . 
for the drum to rotate to the point where the first of. a sequence of con- 
secutive words is under the reading heads) is 8.5 ms or 85OO [is, and that 
thereafter successive words are read at the rate of one every 6 |is. The 
time required to read a "block of N consecutive words to or from the drum is 
then 

8500 + 6N |j.s, 

and the average time per word for blocks of various sizes is 

(85OO/N + 6)|_ls, that is, 



8506 


|_LS 


if 


K = 


1 


91 


M-S 


if 


N = 


.100 


' 23 


US 


if 


W = 


500 


10.25 


|as 


if 


N = 


2000 


6 


|1S 


if 


N.= 


00 



The percentage decrease in this time obtained by increasing W beyond 2000 
is small. This implies that there should be space in the core memory for 
at least one 2000-word block of temporary storage. For some problems for 
which not all data are consecutive, or for which further space must be 
allowed for the results of calculation which are later to be transferred 
to the drum, space for several blocks, each of the order of 2000 words, 
should be provided. 

When the time for useful and necessary calculation is of the same 
order of magnitude as the time required for drum operations, a considerable 
increase in over-all speed is obtained if drum reads and writes are performed 
concurrently with arithmetic. The core memory, normally being used by 
control and the arithmetic unit, would then be only occasionally interrupted 
to either supply a word to the drum or accept a word from the drum. In 
such a mode of operation, space for one further block of data in addition 
to that already mentioned, is required to hold the data for this autonomous 
transfer. 
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Finally, there' is a class of problems, which includes sorting and 
filing, for which the data rate of no auxiliary "memory now under development 
seems sufficient. A mode of operation in which the computer is time-shared 
between two different problems would increase the duty cycle of the arith- 
metic unit. For such a mode of operation to be practical, there should be 
enough space in the core memory for two problems . 

It is believed that an 8192-word core memory is sufficiently large 
to satisfy these requirements. 



3.10 A Computer with Small Buffer Storage 

We begin the discussion of this type of computer by listing the 
equipment required. 

Main Memory 

An 8192-word core memory with an access-time of 1.5 M-S for both 
reading and writing is used. The first 1.0 |-is of a read operation accounts 
for the readout, and a further enforced 0*5 ps is required to regenerate the 
word. A memory register, called Z, is used for writing and regeneration. When 
a number to be written has been placed in Z, the arithmetic unit may proceed 
with further calculation. 

Fast-Access Registers 

Addressable : The two registers A, Q of the double-length accumulator, 
and two further registers R^, R^- 

Non-Addres sable ( from the point of view of a programmer): Two operand 
registers X, Y, the memory register Z, and two instruction registers 0^, 0, 

B-Lines : Up to twelve 13-bit address registers which may be used inter- 
changeably as counters, address-modifiers 'and addresses. 

The connections with the core memory are shown in the following 

•diagram. 
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A word read from the core memory normally goes to one of 0^, 0^, 
X, Y and also goes to Z for regeneration. For the writing operation, which 
necessarily is preceded ". hy a read operation, a gate prevents the word read 
from reaching Z, and Z is set from the arithmetic unit. ■ Since the first 
part of a write operation consists of reading out the word in the memory 
location referred to, it is possible to read one word into say X or Y and 
write another word hack, thereby exchanging two words in 1.5 us. A gate 
from Z back to the reading bus allows a word previously read from the core 
memory and placed in Z to be transferred later to one of 0^, 0^, X, Y. 



3 . 11 Words and Instructions 

A 52-bit word may represent one of the following: a 52-bit fixed 
point number, a packed floating point number consisting of a 42-bit numerical 
part and a 10-bit exponent, four 13-bit control groups . 

An instruction consists of either one or two control groups. (in 
case of two control groups, the possibility is permitted that it consists of 
the last 13 bits of one word and the first 13 bits of the next.) Some 
instructions always' require one control group, some always require two, and 
some may be short or long depending on whether one of the first 13 bits is 
or 1. 
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A short instruction (one control group) consists of three parts: 
F 7 function bits 

B k bits 

C 2 bits called and C 2 

The significance of B and C depends on the value of F. 

A long instruction (two control groups) consists of F, B, C, and also 

N a 13-bit address . 

The significance of N also depends on the value of F. 
Arithmetic Instructions 

B represents an address. If B = 0, 1, 2, 3, that address is 0, 1, 2, 3* 
but if B = h, 5 , •••> 15 j that address consists of the 13-bit integer in B-line 
B^, B^, . .., B-^ip. If = the instruction is short. A short instruction 
with B = 0, 1, 2, 3 refers to A, Q, R^ R^ respectively. If = 1 the instruction 
is long, and the address defined by B is added to N. If = 1, the number in 
the B-line, after it is used to form the address, is increased by one. There- 
fore any arithmetic instruction can also control counting in a B-line. 

Jump Instructions (control transfer instructions ) 

Most jump instructions are long; N refers to the word containing the 
next instruction if the condition is satisfied, and C gives the position inside 
that word (which of the k control groups is to be the first control group of 
that instruction). The address N of an unconditional jump instruction is 
modified by the address represented by B, as explained above. Conditional 
jump instructions have unmodified addresses. There are two types: B-line 
tests and arithmetic tests. For these, B refers to the B-line to be tested, 
or one of up to l6 arithmetic conditions to be satisfied . This latter feature 
increases the number of possible orders by using B, in some cases, for k 
further function bits . 

The only short jump instructions are the so-called "short loop" 

instructions, which increase or decrease the integer in the designated B-line 
1*5 

by 1 (modulo 2 ). If the result is not zero, control is transferred to the 
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first control group of the current word or the previous word, depending on 
whether = or 1. These generalized repeat instructions allow short 
inner loops to be held in 0^ and 0^ during their execution. The instructions 
need be read only. once from the main memory. 

B-line Arithmetic 

The short B-line instructions transfer, the least significant 13 bits 
of the accumulator (A) to the designated B-line, or replace the contents of the 
accumulator with the non-negative integer held in the B-line. There are three 
classes of long B-line instructions depending on whether N represents 

an integer operand 

a core memory address 

the name of another B-line . 

3.12 Controls 

The contents of the order registers 0-^ and 0^ are used by two controls, 
called the advanced control and the arithmetic control . The function of advanced 
control is to scan instructions not yet executed by arithmetic control, determine 
what data are required from the memory, and acquire them. So far as possible, 
both controls operate simultaneously. For example, during the time of an average 
multiplication (3-5! M-s ) there is more than enough time to. read two words from, 
the core memory. Ideally, the memory and the arithmetic unit would both operate 
continuously, and the overall speed of the computer operating in this parallel 
mode would be twice that of a sequential machine with the same number of extra 
registers. In practice, one control will sometimes have to wait for the other, 
so the ratio of speeds is not quite as favorable as 2:1. One control will have 
to wait for the other if one of the following situations occurs: 

(a) its operation requires the use of equipment currently being used by the 
other control, 

(b) its correct. operation depends on a result not yet obtained by the other 
control, 

(c ) there is no space to hold. further data. 
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Interlocks must be provided to detect these cases . 

For a computer with a B-modif ication system, most operands from core 
memory locations have addresses depending on the contents of a B-line, and a 
very high percentage of conditional jump instructions are B-line tests . 
Advanced control executes all B-line instructions. 

A problem arises in the execution of an instruction requiring a write 
to the core memory. Operands may "be obtained in advance of a computation, but 
results must be stored after the computation is finished. In the proposed 
design the advanced control may defer the execution of a write instruction and 
may read further operands during the time required to compute the result to be 
stored . 



3»13 Example 

Calculate = a^b^ for i = 0, 1, 2, . .., n-1. Assume that > p*-^ 

|c^ are stored consecutively and that B^ holds the address of a^ 



The program requires one word: 

F 

'Replace A by 
Multiply A by 

Store the result in location 
Short loop 



B5 holds the address of b^ 



holds the address of c 







B 6 

By holds 8192 -n, that is, -n mod 2 



13 



B 

k 

5 
6 

7 



01. 
01 
01 

00 



Remarks 

1. For the first three (arithmetic) instructions = so these are short 
instructions. 

2. The capacity of 0^ 0^ is two words which is more than enough to hold this 
loop. Only one instruction read is required for this loop, an instruction read 
preliminary to executing it for i = 0. For i -f- 0, advanced control recognizes 
that the instructions are still in the fast memory. 
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3. Since Cg = 1 for the arithmetic instructions, each execution increases the 
address in a B-line by one, which is equivalent to changing the subscript i 
to i + 1 for that variable. 

h. The address of the last instruction is given by C = 00 which means "the 
first control group of this word" as shown by the arrow. 

5. For def initeness, suppose that multiplication requires 3-5 M-s exclusive . 
of core memory access, and that the three other instructions each require 
.25 us for operations exclusive of main memory accesses. The .time for 
arithmetic, per execution of the loop, (Z~= 3.5 + 3(>25) = 4.25 us.. The '..time 
for core memory accesses consists of one initial read for the instructions 
and then two reads and one write per execution of the loop. The average memory 
time is therefore 



For -n reasonably large, this average value is very close' to k. 5 M-s . Since 
6L= 4.25 and S = ^-5j a sequential computer requires 8.75 l^s for each execution 
of this loop, and a computer capable of simultaneous arithmetic and memory 
operations should require max (4.25, ^»5) = M-s . . 

6. A sequential machine executing this program would use the memory as follows 



If the memory were used in this way, a delay of at least 3«5 M-S:. 
(one multiply time) must elapse between "read h^' and "write c^" since 

multiplication cannot begin until b^ has been obtained, and "write c^" cannot 
begin until the multiplication is completed. 




read the word of instructions 



read a^ 
read b^ 
write c 
read a^ 
read b^ 
write c 
etc . 
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Advanced control will be designed to use the core memory in the 
following order: 

read instruction word 

read a^ . 

read bg 

read a^ 

read b^ 

write Cq 

read a^ 

read 

write c^ 

etc i 

The first multiplication takes place while a^ and b^ are being read, and the 
second multiplication takes place during part of the time required to write Cq 
and read and b^ • 

It should also be noted that the successful overlap of memory use and 
arithmetic requires, in this example, that advanced control read the operand 
required for the next use of the multiply instruction during the current multi- 
plication. This is possible because the advanced control performs B-operations 
and executes jump instructions. Therefore, after arithmetic control has 
obtained its operand, advanced control is allowed to process the instruction 
aga in . 

Conditional arithmetic jumps 

When advanced control encounters a conditional arithmetic jump, it 
cannot predict with certainty which instruction will be executed next until the 
previous arithmetic has been completed. In this situation, if instruction 
96 (see order code) is used by the programmer, advanced control waits 
for arithmetic control whereas if instruction 95 is used by the programmer, 
advanced control reads the word to be jumped to and waits to make sure that 
it is the next word of instructions to be executed. 
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J.lk The Design of the Two Controls 

The controls are designed in such a way that there are no rules or 
conventions which the coder must follow. If the coder knows something about 
the operation. of the' two controls, he' may he able to write'a faster 
program, but it is not necessary even to know of the existence of 0^ 0^, X,Y,Z 
or advanced control in order to write a correct program. One of the most 
important interlocks between the two controls is this: if advanced control 
discovers the need for a word from some memory location, and it has previously 
encountered a store instruction to that memory location which has not been 
performed, then advanced control waits. In other words, advanced control does 
not take an old value if a new one is in the course of being computed. 

Considerable equipment is devoted to reducing the number of core 
memory accesses and to performing them as early as possible. Function decoding 
is more complex for a highly compressed order code. Addresses are held in fast 
registers (B-lines), powers of 2 are generated inside of the arithmetic .. 
unit, extra registers are provided for the storage of intermediate results, 
and advanced control is provided with equipment to recognize when a core memory 
word is actually held in X or Y already. Because 0^ and Cv, have a capacity of 
8 control groups, advanced control can process control groups up to 7 
ahead of arithmetic control. Since at least two operands could be obtained 
during the time of one multiplication, two data buffer registers X, Y are 
provided. If it does not conflict with other considerations it is planned 
to use Z in the following way: if there is nowhere else to put a word to be 
read from the core memory, it is temporarily left in Z, and later transferred 
to its correct destination. 

Core memory write instructions are executed when arithmetic control 
reaches them, but advanced control need not wait for arithmetic control. 
Logically, it would be possible to defer a single write operation still further, 
provided there was a register to hold the word to be written. Then the core 
memory would be written into only when it was not needed for anything else. 
Since this requires an extra register, this feature has not been included. 
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To simplify the design of the controls , operands are read from the 
core memory in the order in which they are required by arithmetic control, and 
the next word of instructions is read only after advanced control has processed 
the current instruction word. Because some loops of instructions are held in 
0^, during several executions of each instruction, neither control can alter 
control groups in 0^,0^ during the execution of an instruction. 

The following additional equipment is required to sequence the two 

c ontrols : 

is a 3-bit counter which holds the "address" in 0^,0^ of the control 
group currently being executed by arithmetic control. The four control 
groups in 0^ are numbered to 3, and those in 0^ are numbered k to 7- 
The most significant bit of B-^ indicates the register (0.^ or 0^), and 
the remaining two bits indicate the position inside the register. D-^ 
is used to operate a selector to read a control group to the order 
register of arithmetic control. Long instructions require two uses 
of this selector. 

is a 3-bit counter, similar to D^, used by advanced control. 

is a 13-bit control counter used by advanced control. gives the 
location of the next word of instructions . 

A x is a 13-bit register, which holds the address of the word in the 
register X. 

A is a 13-bit address register for the word in Y. 

y 

A is a 1^- bit register. One bit is set to 1 when advanced control 

encounters a write instruction, and the address from that instruction 

is copied into the remaining 13 bits of A . The extra bit is set to 

zero when the write instruction is obeyed by arithmetic control. 

Therefore A holds the address of any unexecuted write instruction, 
z 

K x is a 2-bit counter which is increased by one whenever advanced control 
finds a use for X, and is decreased by one each time. arithmetic control 
uses X. 
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K is a 2-bit counter for Y whose action is analogous to K . 

y x 

Registers 0^ and 0^ are each extended by h bits , one for each control 

group. The extra bit in each control group may be set by advanced control to 

indicate, in questionable cases, which of X, Y holds the operand. The extra 
bit is read by arithmetic control. 

Sequencing ,of Instructions 

Suppose that the instruction currently being obeyed by advanced control 
is not a jump instruction. If it is a multiply or divide instruction, and if 

also A indicates that there is an unexecuted store operation to be done, after 

z ' 

" the operand for this multiply or divide instruction has been obtained, advanced 
control waits until the store operation has been initiated before proceeding to 
the next instruction. Otherwise on completion of the instruction, is in- 
creased by one, mod 8, provided the result is not equal to D^. If = D-^, 
the next instruction is being executed by arithmetic control, so advanced 
control waits until ^ B-^. If,, by counting, the most significant digit of 
Dg was changed, then either 

(a) advanced control has processed all instructions in 0-^ and 0^ and 
must read another word of instructions, 
or (b) 0-^ 2 contain an inner loop of between 1-l/h and 2 words, and 
advanced control has passed from the last control group of the 
■ first of these words to the first control group of the second word. 

A flipflop is provided for the purpose of detecting (b) above, (b) 
can only occur if one of the short loop instructions, (b 1 = b + 1 or b' = b - 1, 
and jump to the left-hand side of the last word if V is non-zero), has been 
obeyed as a jump.. When such a jump is executed, this flipflop is set to one. 
If, as a result of counting, the most significant bit of is changed, then 
instead of reading a word of instructions, this short loop flipflop is set to 
zero, and advanced control proceeds to execute the next control group. 

In case (a) above, advanced control next compares the most significant 
digits of D-^ and Dg. If they are not equal, advanced control reads the word 
from the address given by into the register (0 1 or 0^) given by the most 
significant bit of increases by one, and proceeds to execute the control 
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group given by D 2 « • If the most significant bits of and are. equal, 
arithmetic control is still executing a control group in the register (0^ or.O^) 
to be read into next, so advanced control first reads the word (whose address 
is given by. D^) into Z, then waits until the most significant digits of and 
D 2 disagree, and then proceeds as before. 

Two operations> "read from the core memory", and "read into Z from 
the core memory", require further explanation. When a word is to be read 
from the core memory, its address is compared with k^ If the extra bit in 
A z is 1 and the addresses coincide, the number wanted has not yet been com- 
puted, and the read is inhibited and advanced control waits . Then, when the 
extra bit becomes zero, indicating that arithmetic control has placed the 
word in Z, it is copied from Z to its destination. If the extra bit of k^ 
is zero, or the addresses differ, the word is read from the core memory in 
the . usual way. 

After arithmetic control has obtained the operand for a control 
group, D-^ is increased by one. Before executing another control group, 
and -are compared. If D-^ = D^, arithmetic control waits until D^. 
Note that D 2 is increased provided it does not become equal to D^, whereas 
D-^ is increased regardless of-'D^. 

Jump Instructions 

When advanced control encounters an unconditional jump instruction, 
it waits until the first bit of and of Dg agree, then it complements the 
first bit of D^, . replaces the other 2 bits of by the digits of. C, replaces 

by the address from the current instruction, and performs the operations 
previously described for reading :a word to one of 0^ or -O^. When arithmetic 
control -encounters an unconditional jump, it complements the first bit of D-^ 
and replaces the other two bits of D- L by C, . compares with and . executes 
the next control group when 4 D 2 » 

•Conditional .jump instructions are executed by advanced control. -If 
the condition depends on the contents of a. B- line, advanced control may execute 
the jump immediately, whereas , if the condition depends on ..a result in the 
arithmetic unit, advanced control waits . until ...= before, making the test. 
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Two. flipflops, F-^ and Fg, are required for conditional jump instructions. 
F^ is zero unless advanced control has encountered a -.conditional jump ' instruc- 
tion and arithmetic control has not. F^ is set to zero when arithmetic control 
obeys a conditional jump instruction. If advanced control encounters a con- 
ditional jump instruction and F^ is already one, it waits until F^ becomes zero 
before setting it to one and proceeding. Fg is used to indicate to arithmetic 
control whether the jump was executed or not. 

When advanced control encounters a conditional . jump instruction, it 
acts as follows : 

(1) If the jump is an arithmetic test likely to be obeyed, (order 95) 

it reads the word, whose address is given by the current instruction, 
into Z. 

(2) It waits until F- L = 0. 

(3) .If the test involves an arithmetic result, it waits until = D^. 
(h) It makes the test. If the jump is not obeyed it sets F^ = and 

proceeds to the next control group. If the jump is obeyed it sets 
Fg = 1 and proceeds to (5). 
(5) If the instruction is not a "short loop" instruction, it follows the 
procedure described above for unconditional jumps. If it is a short 
loop instruction, it complements the most significant digit of if 
C = 01, or leaves it the same if C = 00, sets the two other bits of 

to zero, waits until 4 Dg, and then proceeds to execute the 
next control group. 

When arithmetic control encounters a conditional jump instruction, it 
sets F^ = and either counts (if F^ = 0) or changes in a similar fashion 
to D 2 (if F 2 = 1). 

Store Instructions 

When advanced control encounters a store instruction, it waits, (if 
necessary),- until the extra bit of is zero indicating that a previously 
called-for store operation has now been executed, and then copies the address 
into A , sets the extra bit of A to one and proceeds to the next control group. 
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When arithmetic control encounters a store instruction, it waits 

until the end of any memory operation then in progress, copies the word into 

Z and compares A with A and A . If A = A , for example, then Z is copies 
z x y z x 

into X. After this possible "updating" operation, it. signals the core 

memory to write Z into the address given by A . When the write has been 

z 

completed, the . core memory re-sets the extra flipflop of A^ to zero. 
Arithmetic control is free to execute further control groups as soon as the 
core memory action has been initiated, and advanced control is prevented from 
using the core memory by a "memory busy" signal. 

The Use of X and Y 

Advanced control does a partial decoding of other instructions to 
indicate whether 

(a) the instruction transfers information between the arithmetic unit 
and a B-line, 

(b) an operand is apparently required from the core memory. 

If (a) holds, advanced control waits until = D^, and then executes 
the instruction. 

If (b) holds, advanced control compares the address with A^ and A^. 

The purpose of the following rule is to allow the programmer to save 

accesses to and from the memory in problems where an operand is used a number 

of times in a set of orders. If the address of the storage location referred 

to is not equal to either A or A , that is X and Y are available to advanced 

control for reading or writing, one on these is chosen as follows: If the 

latest use of one of these registers was by the instruction: copy previous 

operand (that is order 58), choose that register; otherwise choose the register 

not used last. A flipflop, set by each use of X or Y, indicates which register 

satisfies this rule. Suppose, for def initeness, that X is chosen. If K x = 0, 

advanced control reads the word from the memory into X, sets A x equai to the 

address, sets K = 1, and sets to zero the l^th bit of the current control 
x 

group in 0-^, 0^ indicating to arithmetic control that the operand is to be 
found in X. If K x / 0, advanced control reads the word into Z, and waits 
until K x = 0, before copying Z into X and performing the other operations 
just mentioned. 
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If the address is equal to one of A 'or A . say A . advanced control 

x y y 

waits (if necessary), until K^. < 3* and then increases K by one, and sets the 
ll+th bit of the current control group in O-^Og *° indicating that the 
operand is to be found in Y. 

The use of these features of advanced control is illustrated in 
the second part of section 3-16.2. 



3.15 Order Code 

Capital letters refer to registers and addresses, and lower case 
letters refer to the contents of registers and memory locations . Unprimed 
lower case letters refer to the initial contents, and primed lower case letters 
refer to the final contents. . For example,^ a' = m means copy into A the word . 
in memory location M, and a ' = a + m means add into A the word in memory location 
M. M is the address after modification.. For arithmetic instructions, 
if = (short instruction) 



M 


= B' 




for 


B 


= 0, 


1, 


2, 3 


M 


= b 




for 


B 


= h, 


5, 


. . . , 


M 


= B + 


N 


for 


B 


= 0, 


l, 


2, 3 


M 


= b + 


N 


for 


B 


= h, 


5, 


. . . , 



if C^ = 1 (long instruction) 



If C^ = and B = 0, 1, 2, 3.. then m is the contents of 'A,. Q, R^. or R^; otherwise 
m refers to the core memory. An unmodified address (M=N) is obtained by 
setting B = 0, C 1 = 1. 

When N refers to a B-line, the name of the B-line is B^. Thus 
b' = b + b^ means add the address from B^ to the address in B-line B. 

The subscript R indicates that a number is rounded to single length 
accuracy. One rounded multiplication instruction is described by 
a' = (a«:m) , q' = a> The initial contents of A is transferred to Q and 
multiplied by m. During multiplication the digits of Q are circulated, and 
digits are not shifted from A into Q. 

The symbol (aq) designates the value of the double length number in 
the AQ registers, (aq) = a + 2 q, where q consists of q with a zero sign 
digit. 
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R is an extra register in the arithmetic unit, - possibly one-half of 
the shifting number register. When the contents of the accumulator are replaced 
by another number (orders 1, 2, ~5, k), the previous contents are then copied 
into R. This facilitates the double precision accumulation of products. 

In the accumulator A, intermediate results are correctly represented 
provided that they do not lie outside of the range -2 < a < 2 . A number is 
said to have overflowed when it lies outside of the range -1 < a < 1. An 
overflow indicator will be set if the result of an arithmetic operation is in 
a state of unassimilated overflow, during improper division if an 
unassimilated overflow is detected during arithmetic left shift, or if 
assimilation causes overflow. The overflow indicator will be cleared by any 
logical instruction ( "and", "exclusive or", "not" or logical shift left or 
right), by an arithmetic shift right of at least one place, or a "test and 
clear overflow" instruction. An attempt to store a number into the core memory 
when the overflow indicator is set causes the computer to stop. 

There are two categories of shift orders, arithmetic shifts and 
logical shifts. An arithmetic shift multiplies or divides a or (aqj by 2 , 
that is, the sign digit of A is duplicated during right shift, and digits are 
shifted around the sign digit of Q. A logical shift considers (aq) to consist 
of lOh- consecutive digits, which are translated to the left or right. Zeros 
are inserted into the left hand end of A on right shift, and on left shift overflow 
is not set. Double length logical shifts shift into the sign digit of Q. 

Stops 

It is preferable if, before stopping, the computer prints information 
telling why it has stopped. This could be accomplished by reserving a small 
portion of the core memory for a multiple- entry printing routine. The various 
stop conditions cause the contents of the control counter to be copied into 
B^, and cause automatic jumps to distinct entries to this printing routine. 
After printing b.^, and other information pertinent to the stop, the computer 
either encounters one absolute stop or , the program might continue. 



(l) See chapter 8 for a discussion of assimilated and unassimilated overflow. 
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If the function digits of F consist of seven zeros or seven ones, the 
computer will stop. The purpose of this is to stop as soon as possible if 
control begins to execute data as instructions — many fixed point data are small 
positive or negative numbers. Similarly any unassigned value of F causes a stop. 
If there are parity checks on the drum or core memories, the wrong parity should 
cause a stop. 

Except as otherwise noted, the following orders are short or long 
depending on whether = or 1. 

Arithmetic orders 



1. 


a' 




m 




r' 


= a 


2. 


a' 


= 


-m 




r 1 


= a 


3. 


a' 




1- 




r' 


= a 




a t 




-1 


m| , 


r ' 


= a 


5. 


a' 




a + 


m 






6. 


a' 




a - 


m 






7. 


a' 




a + | m| 






8. 


a 1 




a - 


H 






9. 


a' 




2m 








10. 


a' 




- 2m 






11. 


a' 




a + 


2m 






12. 


a' 




a - 


2m 






13: 


a' 




2-* 


I 






14. 


a' 




-2" 


M 






15. 


a« 




a + 


2" M 






16. 


a' 




a - 


2 -M 






17. 


a' 




(a 


• m V 


q 1 




18. 


a' 




(q 


• m) R' 


q 1 


= 


19. 


(aq)' 




a • m 






20. 


(aq) 


= 


q • m 
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21. 
22. 

23. 

24. 
25. 



26. 

27. 

28. 

29. 
30. 

31. 
32. 

33. 

34. 
35. 
36. 
37. 

38. * 

39. * 

40. * 



(aq)' = (a • m) + (rq) 
(aq)' = (q • m) + 2" 51 a 



add product 

semi-accumulative multiplication 



R 



R 



remainder, q 1 



m 



The sign of the remainder agrees with 



the sign of the divisor. If the divisor is positive, the remainder 
is less than the divisor. If the divisor is negative, the remainder 
is greater than or equal to the divisor. 



2 a. 



q' = q 
q» - q 



M can be positive, negative or zero. 
For 26, 27 I M 1 < 64, and 



(aq)' = 2 M (aq) 

(aq)' = 2" M (aq) 
set a = 0, then 28 above 
set a = 0, then 29 above 
q' = m 

a 1 = m, m' = a (exchange) 
a' = m' = a + m 
a' = m' = a - m 



for 28-31 



M 



< 128. 



1 1 = 



, • = 



a, m' = a + m 
a, m' = a - m 



a' = 2 a, b' = b - k, where ^ < | a '| f 1. This is a single length 
standardize instruction. If a =. 0, k = 64 and a' = 0. 



(aq)' = 2 k (aq), b' = b - k, where | < 

If (aq) = 0, k = 128 and (aq)' = 0. 
(aq)' = 



(aq)' 



< 1. 



*Denotes short orders 
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Floating Point Orders (ap is the contents of the accumulator considered as a 
floating point number, and nip is the floating point number held in M.) 



41. 


a F 




nip 


42. 


a F 


= 


-v 


43. 


a F 


= 


a F + ^ 


44. 






a F " "F 


45. 


a F 




a F "I^F 


4o. 


a F 




a F ' m F 


47. 


a F 




a F + m F 


48. 


a F 




2% 


49. 


a F 




*\ 


50. 


m F 




a r 


51.* 


a F 




U,b) 



(add M to the exponent) 
(subtract M from the exponent) 

o 

(Copy the contents of the designated B-line into 
the accumulator exponent register, that is, convert 
from fixed point to unstandardized floating point.) 

52. * Standardize ap, counting in the exponent register. 

53. * b' = exponent, a£ = a f . This can be used to convert a floating point 

result to fixed point. 

*Denotes short orders 

Store Orders (see also 32-37, 50, 76, 78) 

54* m' <=■ a 

55 o m 1 = q 

56. m' = (aq) R = a' q' = 

57. m' = a, a 1 = 
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58. 



59. 



m' = immediately preceding operand. Commonly M = 1, 2, or 3- This 
allows the contents of the X or Y register to be transferred to Q, 
R 2 or without affecting the accumulator. 

"store and carry". Assimilate a, and store its non-sign digits in 
location M with a zero sign digit. 
Then a 1 

a > and no overflow 
a ;> r ahd~~bverflow 
a < and no overflow 
a < and overflow. 



0, 2"^", -2~^ , or -2 ^ depending on whether 



Logical Orders 



60. 


a' = 


a © m 




q' = 


a a m 


61. 


a' = 


~m = — m — 2 


62 e 


•o a , = 


M 


63. 


a' = 


2" M a • 


64. 


(aq) 


= 2 M (aq) 


65. 


(aq)' 


= 2" M (aq) 


B-line 


Orders 




66. 


b« = 


N 


67. + 


b< = 


-N 


68. + 


b< = 


b + N 


69- + - 


b' = 


b - N 


70. + 


b' = 




71. + 


b' = 


" b N 


72. + 


b« = 


b + b N 


73. + 


b' = 


b " b N 
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("exclusive or") 

("and" or digitwise product) 

("not" or digitwise complement) 

overflow is not set 

the sign digit is not duplicated 

(aq) is considered to be 104 consecutive 

digits without regard for sign. 



The unmodified address consisting of the 
second control group is used as an 
integer operand. 



The second control group is interpreted 
as the name of a second B-line. 
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74. * b' = b - 1 

75. +i b 1 is the 13-bit group from word N (unmodified), position C. 

76. + . store b in word N position C. ' 

77. + set B, B+l, B+2, B+3 to be the four 13-bit groups of the word at N. 

B is restricted to be one of 4, 8 or 12. 

78. + store the contents of B, B+l, B+2, B+3 in word N. (B = 4, 8, or 12). 

79. ^ a' = b • 2 ^ (b is considered a positive integer) 

80. * a' = a + b • 2~ 51 

81. * a' = a - b • 2~ 51 

82. * b 1 = least significant 13 bits of a 

83. * b' = least significant 13 bits of q 

+Denotes long orders 
*Uenotes short orders 

B-Conditional Jump Orders 

84. * b' = b + 1, if b' f jump to the leftmost control group of: 

this word if C 2 = 0, 

the preceding word if = 1.. 

This is called the "short loop" order. 

85. * b' = b - 1, otherwise similar to 84. 

86. + b' = b + 1, jump to word N, position C if b 1 f 

87. + b 1 = b - 1, jump to word N, position C if b 1 / 

88. + Jump if b / 

89. + Jump if b = 

90. + Jump if the most significant bit of b is zero 

91. Jump if the most significant bit of b is one 

+Denotes long orders 
*Denote§ short orders 
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Unconditional Jump Orders 

92. + Jump to word M position C. (M is the modified address) 

93. + Set B-^j. to be the address of the next instruction, and jump to word 

M position C. This is an "enter subroutine" order. 

94. Jump to the instruction given by The effect of executing a 
94 order is to transfer control to the first, instruction following 
the preceding 93 order. 94 is a "leave subroutine" order. 

+Denotes long orders 

A-Conditional Jump Orders 

95« + The test is determined by the value of the integer B. Advanced 

control is to read out the word into Z on the assumption that the 
jump will probably be executed. 



If B 




0, 


jump if a > 


B 




1, 


jump if a < 


B 




2, 


jump if a = 


B 




3, 


jump if a / 


B 




4, 


jump if overflow, and reset indicator 


B 




5, 


jump if no overflow, otherwise reset indicator 


B 




6, 


jump if overflow 


B 




7, 


jump if no overflow 



96. + Similar to 95 > except that advanced control assumes that the jump 

will not be executed, and does not read the word in advance. 

+Denotes long orders 
Miscellaneous 

97. a' = a, set overflow if a - m is negative. This is a comparison 
of the sizes of two numbers which does not alter the contents of 
any register, 
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98,* 

99-* 

100 . 



Modify the next instruction by the contents of B-line B, in addition 
to any other modification called for» 

Read, into. A, a word whose bits define the states of the various 
switches on the machine. 

Input-output and drum instructions. These have not yet been assigned. 



■^Denotes short orders 



5 .'i6 Programming Examples 

Suppose that two vectors a, b are stored in core memory locations 



A 0* ' A +1 * 



A Q +n-l; Bqj B +l, 



B^+n-l, and that!\ is a number whose 



magnitude is less than 1. It is required to compute the vector c = a + Xb,. 



and store it in locations Cq, Cq+I, 



.,- C +n-l. 



Assume 



q = \ 

H = A o 



B, 







b 6 = c o 
b^ = -n mod 2 



13 



The program is 
F 



-a" = (q x m) R , q' 



a ' = a + m 



m" = a 

L- b' = b + 1, jump unless b = 



B 

5 

k 
6 
7 



C 
01 

01 
01 
00 



Remarks 

leftmost control group 
of a word 



short loop order. 



The inner loop requires one word consisting of four short instructions. 
The input routine can assign storage locations to the program in such a way that 
the first instruction of this loop is a left-hand control group. If an overflow 
test were included in this. loop it would still require only 1-1/2 words. 



-57- 



574 073 



The table on the following page shows the time sequence of events 
for the core memory, advanced control and arithmetic control for the first few 
instructions encountered in this loop. Each typewritten line, reading down, 
represents j^s of time, and events on the same horizontal line take place 
simultaneously. The following assumptions are made: 

1. The last O.^tts of a core memory read operation, after the word 
has been read out and while the core memory is regenerating the 
location, is available for advanced control to proceed with 
further non-memory operations . 

2. Operation times are: l„5/<s for core memory read or write, 

3 . 5/t.s for mn 1 t.i pi i r.at.i nn , 
0.25/^s for simple operations such as 
addition, transfer, jump. 
These times differ from the times that are expected to be taken 
by the proposed computer to do the operations referred to. How- 
ever, the diagram is simpler to draw with these values of the 
times and for any values for these quantities corresponding 
diagrams can be made. 

3. At the beginning (the top of the page), the core memory, advanced 
control and arithmetic control are all waiting. Normally, 
advanced control has performed operations ahead of arithmetic 
control, so this assumption is conservative. 



Time during which one unit must wait is marked 




The first and second executions of the loop and part of the third execution, 
by arithmetic control, are shown. The first and last executions differ from 
the intermediate ones, which are all similar to the second. After the first 
execution of the loop, the core memory operates continuously, the arithmetic 
unit must wait only 0.25/tS out of every 4-^s, and advanced control, although 
supervising memory operations and performing B-line arithmetic, is idle a 
considerable fraction of the time. This is not inefficient, because the purpose 
of advanced control is to maximize the duty cycles of the memory and the arith- 
metic unit, which it certainly does for this problem, 
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q Core Memory 



Advanc ed Control 



Arithmetic Control 




read word of instructions 



1.0 
1.5 

2.5 
3.0 

4.0 
4.5 

5.5 
6.0 



regenerate 




read location B, 







word of instructions to Oj 




Add 1 to B-line 5 



bp to X 



regenerate 





read location A, 



Add 1 to B-line 4 



1 



Multiply q by x 



ap to Y 



regenerate 



set A 7i = address Cp Add 1 to B-line 6 



count in B-line 7 and jump 



read location Bq+1 



Add 1 to B-line 5 



b. to X 



regenerate 



6.2, I I I I.IJ J. A / /' 



store z in location C 







7.75 



8.75 
9.25 



10.25 
10.75 



read. location Aq+1 




Add y to accumulator 



store in z 



obey jump 



Multiply q by x 



Add 1 to B-line 4 



a^ to Y 



regenerate 



set A = address Cp+1 Add 1 to B-line 6 



read location Bq+2 



count in B-line 7 and jump 



Add 1 to B-line 5 



b2 to X 



12.25 




Add y to accumulator 



'//■// i zn. 



store in Z 



Multiply q by x 



etc. 



etc. 



etc . 
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3.16.1 The Scalar Product of Two Vectors 

If a, b are n-vectors stored in locations ^A Q +k±J , |Bq+ ij , . 
(i = 0, 1, . .., n-l), form 

n-1 

a • b = f' N(A Q + ki) N(B Q +/i), 
i=0 

where k and J? are constants and the notation N(x) means "the fixed point number 
held in memory location x" . (This problem arises in the multiplication of the 
transpose of one rectangular matrix by another matrix, if both matrices are 
stored by rows). 

Assume (aq) = at the start of the loop, 

and b. = A_ 

4 U 

b 5 ° B 

b 6 = k 

b 7 =y 

13 

bg = -n mod 2 



The program is 



F 


B 


C 


p-a' = m r' = a 


4 


00 


(aq)' = m • a + (rq) 


5 


00 


b- - b + b N 


4 


10 


b ' " b + \ 


5 


10 


-short loop 


8 


01 



H<*> 



6 
7 



The inner loop is 1-3/4 words. The result is the unrounded double 
length sum of products. The first instruction following this loop would be 
either "test overflow", or "round and store". If k,,/are constant throughout 
all uses of the loop, the third and fourth instructions could be of the 
b' = b + N type, and only 3 B-lines would be required. 



3. In this example the digits under N label a B-line, B^. 
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3.16.2 Evaluation of a Polynomial 

n 

n-i 



Form y = ^_ a.t. , by means of the sequence 



n i~=0 1 



y = a 



y^r^ ~ ty^ + a ^ + -j_> where ^ a ^j a re stored in consecutive core 
memory locations beginning at Aq, and 



b 4 = A 

b^ = -n mod 2^ 

q,= t 



a' = m 4 01 

•a 1 = (q m) R) q' = q 00 

a' = a + m 4 01 

short loop 5 00 

The inner loop itself requires 3/4 word. The instruction 
a' = (q • m) R , q» ^ q 00 forms the product (a • q) RJ and is 
required to be a left-hand control group. 



An alternative method which illustrates the selection of X or Y by 
advanced control assumes: 

b 4 " A 

bj. = -n mod 2^ 
b^ = address of t ... 
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F B C 

a' = m 4 01 

•a' = (m • a) R 6 00 

a' = a + m 4 01 

short loop 5 00 

The inner loop requires 3/4 word. The multiply instruction is 
required to occupy a left-hand control group of a word. Although the program 
apparently calls for two memory references per execution of the loop, one for 
a^ and one for the number t, in fact, t is read out just once and held 
permanently in one of the registers X or Y. This is true because the two last- 
used operands are still available, and t is always one of them. The program 
calls for operands to be used in this order: 

aQ, t, t, d.^ , t, a^ , ... 

and when advanced control decides which register (X or Y) to hold a^ + ^, it chooses 
the register not used last, that is, not containing t, so t is held permanently 
throughout the execution of the loop. 



3.16.3 Continued Fraction 

b Q x 

Form F (x), the n-th convergent of a n + 



n x ' to a 1 + b.^ x 



a 2 



Define 



y = a n 

y l = b n-l x + a n-l y 



and, for i>2, y. =y. xb .+y. .a . 
' — •'i J i-2 n-i J i-1 n-i 

Thus 

F (x) = y /y . 
n v J n /J n-1 
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Suppose that a , b , , a , . b ... a, b„ a,, are stored in consecutive 
n n— ± n— 1 n— Z 10 

memory locations beginning at the address in B, , and suppose that B r has 

4 5 

-(ii-l) initially, and that x is in A. 



m' 



1 1 = 



m 



m' = a 
a' = m 

(aq)' = m • a 
a 1 = m, r 1 = a 
(aq) 1 = m • a + (rq) 
m' = (aq). 



'R 



ii = 



m 



m' = a 

a' = (q •„ m) R 
(aq)' = a • m 
a' = m, r' = a 
(aq)' = a • m + (rq) 
— short loop 



B 

3 

4 

2 
4 

3 
4 
2 


2 
2 
4 
3 
2 

4 
5 
2 



C 

00 

01 

00 
01 

00 
01 
00 
00 

00 

00 

01 

00 

00 

01 

01 

00 



Store x in R„ 



Store j Q ( = a n ) in R 2 



b , x 
n-1 



y^ unrounded 



Round y. , 
•'l-l 



q = y. 



i-2 



i-1 



y. „ x b . 
d 1-2 n-i 



Add y. t a .to obtain 
J l-l n-i 

y. unrounded 
l 



F n (x) = y„/y„ n 
n n/ n-1 



This program requires 4 words, and its inner loop is exactly two words 



3.16.4 Reverse the Digits of a Word 

Suppose that x, the number in the accumulator, has digits 
Xq, x^, . .., Xj^. It is required to form instead y, the number whose digits are 



X^, x cj0j • • «j Xq« 
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F 




B 


C 


N 




b' = -N 




4 


10 


52 




r>(aa)' = 2~ M (aq) 


logical 


1 


00 




shift 1 right 


a' = m, m' = a 




o 


uu 




excnange 


(aq)' = 2 U (aq) 


logical. 


1 


00 




shift 1 left 


a' = m, m' = a 




2 


00 




exchange 


short loop 




4 


01 







The program requires 1-3/4 words: one long instruction and 5 short 
ones. The logical shifts are used in order not to set overflow during left 
shifts. If the exchange instruction were not available, the inner loop would 
require two more instructions, and it would be necessary to use R^ as well as 

V 



3.16.5 Sideways Addition 



Count 


the number of ones 


in the 


word in 


Q and place the result 


in B. . 

4 












F 


B 


C 


N 


b 1 = N 




4 


10 


clear B, 
4 


b" = -N 




5 


10 


52 set the count 


r> clear A, (aq) 1 


= 2^(aq) logical 


1 


00 




a' = a + b 




4 


00 




b' = a 




4 


00 




-short loop 




5 


00 





The program requires 2 words, and the inner loop requires one word. 



3.16.6 Square Root of Y 

If a„ = ^ + ^ x and a = i ( a + — ) , then fa ] — >/5c7 Furthermore, 
2 2 n+1 2 n a ' / n\ ¥ ' 

n ^ J 

it may be shown that if x > l/4, &r is a very accurate approximation to /x". 
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This suggests the following method for obtaining /y~ as fast as possible: 
define x = 4* 1 y, ^ < x < 1, then\/y r = 2 n a^ where ^ a ^j are defined as 
before. 

The only difficult operation is "i+ n " , since the computer standardizes 
by 2 and not 4, and most of the complexity of the program which follows is due 
to this type of standardization. Suppose that y is in A. 







F 




B 


C 


N 


b' 




-N 




4 


00 


5 


b' 




N 




5 


00 





a' 




2 k a, b' = b 


- k 


5 


00 




m» 




a 




2 


00 




b« 




" b N 




5 


00 


5 


a' 








5 


00 




• (aq) 


^ 2- M (aq) 




1 


00 


Halve 


b' 




a 




5 


00 




Clear A, (aq)' = 


2 M (aq) 


1 


00 


Double 


b' 




a 




6 


00 




a' 




m 




O 




^ y-*A. 


a' 




2 a 




6 


00 




m' 




a 




O 


uu 


X 


a' 




-M 

2 a 




l 


00 




a' 




a + 2 




l 


00 


a o 


A m' 




a 




3 


00 


stores a. 

l 


a' 




m 




2 


00 


X 


a' 




(i). 




3 


00 




a' 




a + m 




3 


00 




a' 




-M 
2 a 




1 


00 




— short loop 




4 


01 




a' 




2 a 




5 


00 
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The program requires 6-1/4 words, and the inner loop is 1-1/2 words. 



3 . 16 . 7 Matrix Multiplication 

Let j|a^j|j, t>e m x n and n x p matrices stored by rows beginning 

at memory locations Aq, Bq respectively. It is required to store the product 
c. . 1 1 by rows in locations beginning at C n , given that 



and 




The B-registers will be used for the following purposes: 



b 4- 


address 


of 


a. . 

ij 


b 5 - 


address 


of 


b., 


b 6 " 


address 


of 


c. , 

lk 


b 7 = 


(m) 






b 8 * 


n 






b 9 = 


P 






b 10 


= (n) 






b ll 


= (p) 






b 12 


= np-1 






b 15 


= link, 
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b^Q is the count for each scalar product,, 



where a number in parenthesis indicates a counter whose initial value is that 
number, and whose final value is zero 

b^ is the count across any one row, and b^ is the over-all count. 



a' = b 
q« = b 
(aq)' = m .- 

b' = q 

b' = b - 1 



-> b' 



N 



r- unconditional jump 



»b' = b - bjj 



Clear AQ 



r=»a' = m, r' 



(aq) 1 



m 



= a 

a + (rq) 



b« = b + b, 



•b' = 

m' = 

b 1 = 

b' = 

b« = 



N 

b - 1 jump unless 
(aq) R 

b - b N 

i 

b - 1 jump unless 
b " b N 



B 


C 


N 




o 


on 






Q 
7 











• 00 




no— 1 to b, „ 

lip x 


12 


00 






12 


00 






11 


10 


9 


p-*(p) 




c (4) 


Vi 


enuer xoop 








waste 1/2 word J 


4 


10 


8 


reset A + kn 
u 


10 


10 . 


8 







00 






4 


01 






5 


00 




inner loop 1-1/4 words 


5 


10 


9 




10 


01 






6 


01 




store c, 
ik 


5 


10 


12 




11 


c 


N 


row count (p) 


5 


10 


9 




7 


c 


N 


over-all count (m) 


15 


00 







b 1 - b - 1 jump unless 

obey link 

The entire program requires 7-3/4 words including calculating the 
constants for this triple count. If it were desired, to preserve the contents 
of the B-lines, about 2 more words would be required. 



4- The notation C N is used here to mean an address indicated by the arrow 
which would probably be denoted symbolically by the programmer, and 
assigned by the input routine. 
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APPENDIX TO CHAPTER 3 



The design of a computer with a I.^as main memory depends critically 
on what other equipment is available, and the cost of building two memories, 
one for data and one for instructions, or a memory with multiple word simul- 
taneous read-out. It seems quite possible that work now going on in memory 
development will lead to faster memories. If this occurs the design described 
above will be modified. 

The computer described may be thought of as having a "tactical" data 
anticipation system. Advanced control, with no help from the coder, scans a 
few instructions ahead, and tries to obtain operands before they are required 
by arithmetic control. A "strategic" data-anticipation system, on the other 
hand, transfers blocks of data and instructions from the main memory into a 
large, fast, buff er memory, long in advance of their use. For this, the coder 
must somehow indicate what data are wanted, and where they are to be put. 

An earlier design based on a 64-word rapid-access buffer memory and 
the simultaneous transfer of 4 words between the main memory and this buffer, 
memory will now be described. Its description is included because it indicate 
an alternative method of mitigating the memory access problem. The detailed 
description given refers to a computer with an internal memory of 4096 40-bit 
words o These numbers differ from those now thought desirable for a computer. . 
However, since the material is being included for illustrative purposes it was 
not thought necessary to modify these numbers, 

3.17 Memory 

C: Slow memory 4,096-word core 

D: Fast memory 64-word diode capacitor 

The 64-word memory D will be divided into 8 blocks of 8 words each, 
Reading between C and D will be by 4-word half blocks. 
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The 8 "blocks of D will be divided into 3 categories : 

I* Block for temporary storage, 

2. Blocks i and. 2 for instructions, 

3- Blocks 3y k, 5, 6 and 7 for data, constants, etc* 

Transfers to blocks 0,- 3,. k, 5, 6 and: 7 will be made by half block transfers , 
h words at a time. Transfers to blocks 1 and 2 will be made by pairs of 4 -word 
transfers so that a complete block is filled by a Single transfer order. 

The 512 blocks of C will be addressed in half -block groups of h words 
designated by the 10 most significant bits of a 12-bit register . 



3.18 Execut ion of Instruct ions 

All instructions are executed from the 16 locations (32 instructions) 
of and D 2< Instructions are transferred into these blocks in pairs of 4-word 
transfers so that a single transfer instruction brings in 16 new instructions. 
The control counter is a 12 -bit counter that indicates thf location in C of the 
next instruction pair. The least significant octal digit of the control counter 
gives the position within the 8-word block of C and while the 3 most signifi- 
cant octal digits identify the block (l out of 512 ) in C. 

A flip flop F^ is needed to specify whether the next instruction pair 

will be taken from or from Whenever the least significant Octal digit 

of the control counter is zero, indicating the beginning of a new block, F^ is 

changed and a new block of 8 words is read into the D. designated by F n (D., or D, 

l.ii* 

according as F^ is or 1, respectively) . 

A second flipflop, F £> is required to inhibit the transfer of new in- 
structions from C to D if a loop of instructions is being repetitively executed; 
The control of is by means of D-jump instructions, i.e. , jump instructions 
referring to instructions already in D; (For details see the discussion of jump 
instructions. ) In ordinary operation flipflopB F^ and Fg agree and are changed 
together as determined by the control counter . When a D-jump occurs F is set 
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to designate the block of the new instruction to be executed, but is not 
changed. As long as F^ and disagree, the carry from the least significant 
octal digit of the' control counter is inhibited and no new transfer from C to 
D is allowed. When F^ and F agree once more, the normal sequencing mode is 
resumed. 

3.19 Jump Instructions 

The ideal jump instruction would always refer only to a core location 
but would sense when the wanted instruction is in D and in such a case jump 
without requiring an access to C. If, however, the wanted instruction is not in 
D, the 8-word block containing it is transferred to D^, and F^ and are set to 
zero. 

The sensing of the presence of the wanted instruction in D can be done 
for certain cases by adding another 3-bit counter and making a comparison between 
the jump address, the control counter and this 3-bit counter which, in conjunc- 
tion with F^, is a 4-bit counter specifying a position in and D^. If the 
jump address is less than the control counter by no more than the 4-bit counter, 
the wanted instruction is in D. If the jump address is greater than the control 
counter, the wanted instruction may or may not be in D and we cannot tell without 
additional equipment. 

Therefore two kinds of jump instructions will be provided: 

1. C- jumps which have 12-bit addresses and cause the 8-word block of 
C containing the wanted instruction to be placed in D^. 

2. D- jumps which have 4-bit addresses and cause a jump within and 

The C- jumps may or may not have the automatic sensing described above, 
depending upon cost, time, and difficulty experienced by the programmer in 
keeping track of instructions in D. 
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As mentioned in section 3-17, the memory D will be divided into 8 
"blocks of 8 words , There will be a connection between the B-lines and the 
transfer of words from C to By D^, B^, Dg and D^.. This connection is 
established through the instructions used to set the B-lines, of which there 
are eight designated B Q , B^, ... B^- Such instructions have a 3-bit B-desig- 
nation and a 12 -bit address i 

An instruction setting B^ (i = 3, 5j 6, 7) to n may cause the h 
words of the half -block of C designated by the 10 most significant bits of n 
to be placed in the first or second half of depending on whether the least 
significant of the 10 bits is or 1. The position in the half -block of D i of 
the word designated by n will be given by the two least significant bits of h. 

Another way to view the transfer is that the most significant 3 octal 
digits of n specify one of 512 8-word blocks of C while the least significant 
octal digit of n specifies a position both in the designated block of C and in B\ . 

Whenever the modified address is such that the least significant octal 
digit is 3, indicating that the first half of is used up, a new access may be 
started to fill the second half from C . Similarly, if the least significant 
octal digit of the modified address is 7* an access may be started to fill the 
first half of 'D from C. Thus the block is automatically kept filled if desired 
and 4 back values of current data are always available after the usual starting 
procedure * Similarly the current block can be transferred from D to C . 

One bit of the B-instruction will be used to inhibit the automatic 
transfer from C to B^ when desired. This makes it possible to use the designated 
B. as a counter and to reduce the number of transfers from C. 



3-21 Basic Form of Instructions 

The word length is assumed to be k-0 bits with two 20-bit instructions 
per word. Instructions are divided into four basic categories which are 
distinguished by two bits: 
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00 B-line 

01 Transfer and C-Jump 

10 Arithmetic and D-Jump 

11 Input-Output 



3 .22 B-Line Instructions 



These instructions will have a 12-bit C address, a 3-bit designation 

i D. , a bit to inhibit the automat: 

i • 

the function and two bits for the category. 



for. B^. and D^,. a bit to inhibit the automatic transfer if desired, two bits for 



Inhibit 



B. or D. 

1 U 3 








12 bits 



Category Function 



Address 



3 . 23 C-Jump Instructions 

These instructions will have a 12 -bit C address, a 2-bit category 
number and 6 bits for the function. 







r 

Category Function 



T 

Address 
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3 o 24 Arithmetic Instructions 

These instructions will require a 6-bit address (3 bits to designate 
and EL and 3 bits to give position in D^), 2 bits for the category and 8 bits 
for the function. This leaves 4 bits unassigned. One of these will be used to ; 
augment the 6-bit address to permit specifying a shift of up to 127 places and . 
the other 3 are left unassignedo 



1 





8 bits 


4 bits 


6 bits 



Category Function Address 



3»25 Input-Output Instructions 

No consideration has been given to these- 

3 o26 Register Arrangements 

The assumption is made that only A and Q are used, but the inter- 
connections are different from those in the Illiaco The following . things are 
assumed: 

I. Certain instructions cause automatic transfers between A and Q, 
including interchanging them- 

2 In most multiply instructions one operand is in A and is trans- 
ferred to Q before the multiplication begins - 

3. In division the dividend is either A or AQ and the quotient is 
put in A, 

4° It is possible to shift A without shifting Q» 

5. The "add product" instruction is an exception to the use of A 
and Q and requires one extra memory register- 
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3.27 Instruction Code 

The examples use a notation that is self-explanatory except for the 
arithmetic instructions, which are interpreted as follows. The two digits are 
octal digits, the first referring to a block and B-line (i = 3, ... 7) or 
block alone if i = 0, 1, 2. The second digit refers to position in the block. 
Thus, for example, the instruction 

42 >Acc 

means that after the position (here 2 in the instruction) has been modified by 
B^., place the contents of the appropriate position of block D^. in the accumulator. 

The symbols Tc and Tu refer to conditional and unconditional C. to D 
transfers, a conditional transfer being an automatic one dependent upon the 
modified address as described on page 71- A conditional transfer refers to the 
next block in C while an unconditional transfer refers to the block specified 
by the modified address. Similarly Sc and Su refer to conditional and uncondi- 
tional writes from D to C. 

The instructions used in the examples have been the following ones. 
It is expected, of course, that there will be many others. 

1. B-line instructions 

a. Set B^ positively or negatively 

b. Add or subtract to or from B. 

i 

c . Count B . 

l 

d. Jump if B. / or if B. = 

2. Arithmetic instruction 

a. Clear add and subtract 

b. Hold add and subtract 

c. C(n) . • A 

d. Add product 
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e. C(n) • Q, rounded or unrounded 

f. C(AQ) -h C(n) 
go Shift 

h. Store, rounded or unrounded 
Jump instructions 

a. Jump if C(A) > or if C(A) <. 



3.28 Programming Examples 
Matrix Multiplication 



C. . = <^a., b, . 
ij ik kj 



i, j, kinO<t<n-l 



A stored by rows, then a^ in (a + ni + k) 

B stored by columns, then b, . in (b + nj + k) 

kj 

G stored by rows so C^^ goes to C + ni + j 



then j + 1- 



N(a + ni + 


k) N(b 


+ nj + k) 


: — s» c + ni 


+ j 


i remains 


constant 


, unless 


j = n, 


then 


— >j i + 


1 c' where a 


1 begins at a, b' 


at b, 


c' at c 


and counts 


k - n 


B l 


\ 


a' 






j - n 


B 2 


B 5 


b« 






i - n 


B 3 


B 6 


c' 
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-n- 



-»B, 



-»B 



4 



Tu 



->B, 



-> -n >B, 



b >B 5 



Tu 



-> -n »B, 



clear AQ 



>40- 



-s-Acc 



Tc 



50 Add product Tc 
Count B 1 , B^, B 5 

-jump if B 1 f 
60 Store rounded Sc 



B. - n- 

4 



"B, 



Count B 2 , B^ 
-jump if B 2 / 



B. + n— 
4 

Count B„ 



->B 



4 



jump if / 
60 



Tu 



Tu 



Su 



Can be trivially altered 
to do (m x n) • (n x p) . 



Evaluation of Polynomial 



P(x) = 



n 



i = 



a. x 

x 



n-i 



define bQ = a^ 



b. . = xb. + a. , 
1+1 1 x+1 



b = P(x). 

n 
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Suppose x in Acc and a^ in a + i when a, n are known 



Store Acc in 00 
-n — 



30- 



- B 3 
->Acc 



count 

>00 multiply 
30 Add 
count B^, B^ 

— jump if B^ f- 



Tu 
Tc 



Tc 



Square Root 
y in AQ 

define ^ = 1/2 + y/2 



a . , = (y/a. - a. )/2 + a. stop at k when y/a, - a, > 0. 



10 . 1/2 



constant 



11 



Acc — 
Q *01 



00 



shift right 1 
add 10 

1 — j^P 

02 add 



>02 store 

00 >A 

01 >Q 

02 divide 
02 subtract, 
shift right 1 
jump if A < «s- 



: entry 
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Relaxation 



a o 
bo o . 
c o 



a- 
b- 



-»B, 



->B, 



5 

(42 =>Acc 

-n >B 1 

^>Acc 



l/4(N(a) + N(b) + N(b + 2) + N(c)) ^b + 1 

stop after doing n. 

Tu 
Tu 
Tu 
Tu) 



Tc 

Tc 
Tc 



>30- 

40 add 
42 add 
50 add 

shift right 2 (single length) 

41 store rounded S 



Count Bp By B , B^ 



-jump if B 1 / 
(40 s>Acc Su) 



Relaxation: Extrapolated Liebmann 



a o 
bo o 
c o 



N(b + 1) + l/4(N(a) + N(b) + N(b + 2) + N(c) - 4N(b + l)) (l + 6) =>b + 1 
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51^ 



10 £ ~~| 

constant 



a >D^ 


Tu 




Tu 


c > B 

5 


Tn 


(42 >Acc 


Tu) 


-n *B 1 




>30 >Acc 


Tc 


40 add 




42 add 


Tc 


50 add 


Tc 


shift right 2 




41 subtract 




00 store rounded 




xu inuiu x p ±.y 




00 add 




41 add 




41 store rounded 


S 

c 


count B n , B_ , B. , 
-L J 4 


B 5 


-jump if B 1 i 




(40- — *Acc 


Su) 



To Merge 2 Sequences of Numbers 
Given k numbers beginning at a 
X. numbers beginning at b 
store the merged list beginning at c. 
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a Tu 

b >B 6 Tu 

c >B 7 

k ^B 1 

/— B, 

— jump (core) 

-> jump if B 1 = 1 f- 

50=— — -»A.cc Tc 
70 store Sc 
count B^, B^, B^ 



50 >Acc 

60 subtract 
— jump if < 



- jump if B 2 = < 

60 $Acc Tc 

70 store Sc 
count B^ 

jump 



_>jump if B 1 f 

70 Su 



-80- 



CHAPTER k 
BASIC CIRCUITS 



k.l Introduction 

Even assuming that only lumped constant circuits are considered, 
it is at once realized that the question; "Which are the best circuits for 
a computer doing a given job?" has no simple answer. Apart from the tre- 
mendous difficulty of comparing the diversity of circuits satisfying imposed 
conditions , it is not even .clear what is meant by "best circuits". The 
decisive factor may be cost, or . size and simplicity (low cost and simplicity 
do not necessarily go together); then the interest may be speed and most of 
all, perhaps , reliability. Ease of servicing and standardization, too, may 
be on the list. A preliminary statement of the position of the Digital 
Computer Laboratory would be to say that at least for the arithmetic unit 
and the control, the utmost in speed—compatible with a given design philosophy 
and rather severe standards of reliability--has been aimed. at. This does 
not imply that complexity and cost were neglected, but here the limitation 
was the rather vague one of being "reasonable". 

In other sections of this chapter the general philosophy of the 
Digital -Computer Laboratory will be discussed, i.e., the reasons for choosing 
"direct-coupled, asynchronous non-saturating transistor circuits". It is, 
of course, true that the choice of circuits, design procedure, etc., is 
affected by the past experience of the group. There seem to be> however, 
good arguments in favor of. our choice and these will.be explained in some 
detail in the following sections . 

Another introductory remark may not be out of place. It is well 
known that a complete computer can be designed using. a single type of basic 
logical element, i.e., the AND-NOT (Sheff er-Stroke) element. Even storage 
elements can be produced from AND-NOT' s. A more common and more flexible 
set is one starting with AND, OR, WOT and a symmetric flipflop of the Eccles- 
Jorda'n or some equivalent design. To the .set of basic logical, elements, 
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one invariably has to add purely electronic elements in' the form of power 
amplifiers (drivers), level restorers, etc. In order to obtain maximum 
flexibility and speed, additional logical elements like Schmitt trigger 
flipflops (having only one output), EXCLUSIVE OR' s, and others may be 
introduced. It is also quite possible to choose larger units of circuits, 
i.e., to design a separate circuit for every n- input, m-output box defined 
by a table of corresponding input and output combinations . 

The set of logical elements should often, be- even more extensive 
than the augmented set just mentioned, containing such new designs as 
C-Elements. These are desirable in order to realize "s peed- independent '' 
circuits in the sense of Chapter 5, i.e., circuits in which information 
flow is not affected by race conditions. It seems especially desirable 
to use this kind of circuitry in the rather complex control proposed for 
the new machine. 

Finally, mention should be made of the fact that the use of binary 
logic is not at all mandatory; some gates, to cite an example, are really 
3-level devices , when we consider the states to be indicated by currents. One 
can pursue this idea of ternary logic and build -elements of the flow-gating 
type described below. In this kind of circuitry the transmission of informa- 
tion (e.g., from pile flipflop to another) is controlled by modifying the 
average potential of a whole flipflop. It turn's out that this leads to a 
considerable economy in hardware. The speed of operation is, however, some- 
what reduced. 

. k.2 . Asynchronous vs. Synchronous Operation 

Computer circuits are frequently classified according to the type 
of control- sequencing used. In a synchronous computer the elementary opera- 
tions like shifting, adding, etc., occur at fixed intervals in time', the 
moment of occurrence being controlled by a clock. -In an asynchronous computer 
each operation produces an "end- signal" which in turn initiates the next 
operation in a list. If the production of end-signals is not too time-consuming, 
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an asynchronous machine :wi 11. be faster. The duration of individual operations 
varies (even with the numbers processed) and a clocked ..synchronous machine 
must be adjusted for the maximum possible duration. 

There are systems of various degrees of asynchronism ranging from 
those in which the times of action of a set of circuits are simulated in 
delay devices (i.e., in which the end-signal or reply-back signal is simply 
the previous end-signal delayed by a sufficient amount of time to allow the 
set of circuits to Operate ), to systems in which the operation of each set 
of circuits is examined. by a checking circuit which gives end-signals • if and 
only if the operation has really been performed. 

•A special type of completely asynchronous operation is obtained by 
using speed- independent circuits. In this type of . circuity information can 
continue to flow only when all previous elements in the chain have reacted 
to it. Chapter 5 gives the theory of interconnecting logical elements in 
such a way that the .set is speed-- independent. This means that individual 
logical elements too must, be designed carefully to meet the conditions of 
speed* independence . In particular, last moving points (see section ^.7) 
should be provided in order to simplify the design procedures . 

Although the absence of a clock has to be paid for in terms of cir- 
cuitry, it is believed that the advantage in speed outweighs by far the 
additional expenditure. Furthermore, the combination of "asynchronous operation 
and '2-level dc-representation" explained in section h .4 allows the machine 
to be stopped. in any state for as long a period as • is desired. Checking its 
operation is reduced to a simple steady-estate voltage check of strategic 
points. Also, an asynchronous circuit is probably, more reliable, since 
changing :time constants cannot influence its operation. 

Finally, it should be mentioned that in a. very fast .computer the 
time for information to travel from one part of the machine to another will 
compare to the binary operation times of the circuits. Signals are delayed 
by at le'ast 3»3- ™As per meter. In asynchronous circuitry b knowledge of 
this information transit' time is not required to assure correct operation. 
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k.3 Direct-Coupling vs. AC-Coupling 

Computer circuits can be ac-coupled or de-coupled- In the first 
case a capacitor or a transformer is used to connect one logical element 
to the next. In the second case only resistive networks are used as coupling 
elements. It should be noted that dc-coupling is only of interest if each 
element is also dc-coupled internally. 

The Digital Computer Laboratory has followed the Institute for 
Advanced Study by working with circuits . which are dc-coupled because it is 
believed that .circuits of this type have advantages of reliability and 
serviceability without paying any direct price in speed. These circuits 
do require more engineering and possibly more hardware than ac-coupled 
circuits of comparable speed, especially because of drift problems. These, 
however-, are by no means as serious in a switching circuit as in linear 
amplifiers „ 

Although dc-coupling does not necessarily imply the use of the 
two- level dc-representat ion explained in the next section, this representation 
certainly seems to be the natural complement of direct-coupling. This is 
especially true when we consider testing a single logical element. A given 
combination of dc-inputs then always corresponds to a predetermined combin- 
ation of dc^-outputs. Checking the operation of a logical element or a 
group of such elements therefore becomes quite simple. 

k,k Two-Level DC vs. 'Pulse Representation 

The method of representation of binary states by electrical quanti- 
ties (voltages or currents) is another important characteristic of circuits. 
In synchronous ac-coupled circuitry the states are usually represented by 
the presence or absence of a pulse at given times, theise times being deters 
mined by a "clock". In magnetic recording this has been termed "return- to- 
zero" representation. -In asynchronous dc-coupled circuitry it is usual to 
represent the binary states and 1 by voltage or. current levels, or to be 
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more precise, by voltage or current bands . This corresponds to the "non- 

retum-to-zero" representation of magnetic recordings' . 
O 




Figure h.l 

.Representation of binary numbers by voltages: 
a) pulse representation, b) two- voltage- level 
representation. This is drawn, for the case of 
minimum sensing time, i.e. ,. the time required 
for sensing the voltage state approaching' zero „ . . 

The difference between the pulse representation and the two-voltage* 
level representation may be illustrated with Figure h„±. • The pulse signal 
must be able to go up and. down during a clock period. -Thus, if a pulse 
represents a and the absence of a pulse represents a 1 (the opposite 
: convention being possible too), the waveform would be as shown and the 
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duty cycle of the switching . element in the case of a sequence of O's would 
be 1/2. If the two- voltage representation is used, corresponding to the 
lower voltage level, and 1 to the higher one, the signal as shown in b) 
changes from one state, 0, to the other. state, 1, in the same time interval. 
Since the voltage excursion in amplitude is the same for the two cases, 
while the excursion time is twice as long for the two-voltage representation,, 
the charging current and hence the power switched need he only half as great 
for the two- voltage representation. However, the switching element must 
he capable of being "on" with a duty cycle of 1. Thus, only l/2 the 
power is available from the switching element in this case compared to the 
pulse system. These simple considerations show that the speed of a circuit 
is to a first order, the same for the two systems just described. 

In practice, the pulse system requires a space between pulses to 
allow for timing tolerance. No space is shown in Figure 4.1 which is a 
limiting case of. shortest possible times. In addition, some time is re- 
quired to sense the existence of a . signal and this must be added to the 
time tolerance. The two- voltage representation does not have a time 
tolerance requirement, but it does require a sensing time greater than .zero. 
If the sensing time required for .each system were the. same, the time for . a 
binary information period would be shorter .for the two-voltage case by the 
amount of the time tolerance. 



4.5 Transistors vs. Tubes 

As was indicated in the introduction, the aim of the Digital 
Computer Laboratory has been to design the fastest dc-couple.d lumped- 
constant circuits, using presently available components . Both vacuum 
tubes and transistors were considered for the triode .elements. Circuits 
using transistors appear to be faster than vacuum tube circuits, and. the 
circuits to be presented later are based upon the use of graded-base 
transistors . 
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To see why transistor circuits are faster than tube circuits one 
can observe that in circuits the total time required to. change a circuit 
element from one state to another is due to a. circuit time and a "device 
operation time. The circuit time will be considered first. 

In any simple circuit the time of transition from one potential 
.state to another is proportional to .the load capacitor size and the potential 
difference between states and inversely proportional to the current charging 
the capacitor. Transistors and vacuum tubes may generally have load 
capacitors of about the same. size in an operating circuit; usually 10 to 20 
|i|af . The ; voltage difference between states must be at least the value 
necessary to switch the device: about 0„ 5 volt for transistors and 5 volts 
for tubes. .However,, the .. switching ^signal must be generated at the collector 
or plate potential respectively, and usually an attenuation exists, between 
the point of generation and the switching point. -This attenuation and the 
tolerances of. the units frequently, and perhaps ■ usually y require the gener- 
ation of a signal about 10 times the switching signal or about 5 volts for 
transistors and 50 volts for tubes. -The currents . switched by transistors 
and tubes' are about the . same : 5 ma« 

One can ..use the parameters given above to obtain an idea of the 
time necessary for a circuit to .change from one. binary state to the other: 

t = k^ 
i 

where . t ■= time 

C = load . capacity 

V ,= voltage swing 

i = charging or switching current 

k = constant close to 1 but in the 
range . l/2 — > 2 depending on the 
complexities of the circuit. 
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Circuit time 
for transistors 

C = 10 * lef 12 farad 
V = 5 volts 
i = 5 " 10 ampere 

■ t- 10 ' 10-12 ' ^ IP' 8 
5 • 10" 3 

t := lO mtig 

The device switching speed must he added approximately to the • 
ahove times. For vacuum tubes this is equal to the transit time of the 
electrons from grid to plate, a time of the order of 5 mu.s. For transistors 
the time for' switching is complicated by many factors o For small signal 
operation the reciprocal of the 3 &b down-a- cutoff frequency may he used 
provided the transistor is not at any time in saturation !^ . For recent 
graded-base transistors this frequency is more than 200 mc per second and 
hence the. time, is less than 5 mi-is. Thus, the total switching time for' a 
transistor circuit would he about 10 + 5 = 15 mus, while the corresponding 
time for a vacuum tube circuit is about 105 nps . 

The elimination of saturation, which is mandatory for high-speed 
Operation, as mentioned above, often requires the addition of some circuitry 
as does the fact that the. input impedance of transistors, even in the 
grounded emitter configuration',-, is low at all frequencies. The latter fact 
implies the presence of emitter followers after practically every dc- 
stepdown network. This, together with bumping diodes to re-standardize 
signals, makes for a somewhat greater complexity of the transistor logical 
elements as compared to those using tubes. 

(l) S. R. Ray:. "The Effects of. Saturation on the Switching Response of 
Common-Emitter' Amplifiers using Bif fused- Base Transistor's", ■ Digital 
Computer Laboratory File No. 208, January 15, 1957° 



Circuit time 
for tubes 

C = 10 • 10" 12 farad 

V = 50 volts 

i = 5 ' * 10 ampere 

t = 100- mu.s 
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•Another problem should also be brought up. The lower power- 
handling capacity. of transistors means that the ratio (noise^power ')/( switched- 
power) could . be ■> higher than that of tubes. Luckily, the average impedance 
of transistor circuits is only about 10$ of that for tubes and. furthermore, 
the .smaller physical size of the active elements makes for shorter • leads. 
The noise-power (= average of square of. noise voltage/impedance) is> there- 
fore, probably smaller in the same ratio as the power- handling .capacities. 

it-,6 -Reliability and Feasibility of a Transistorized Asynchronous DC-Coupled 
. Machine 

In order to investigate feasibility and reliability questions ^ a 

model computer using 752 transistors and ^98 diodes was built by the: Digital 

(2) / 
Computer Laboratory. • It operates using k words (.2. bits for the instruc- 
tion add, subtract, multiply, divide; k bits for the number to. be operated 
on) taken from a nondestructive core memory and processing them in a six- 
register (AA, QQ, R^R^) arithmetic unit. The circuitry used 1 mc transistors 
(GE 2N^3 series and Ti 202 series). in npn-pnp combinations. The tolerances 
on supply voltages were -5$ and on resistors 3$° 

The model executed: cycles, of four orders, each one referring to 
the last contents of the accumulator and a number stored alongside the 
instruction. • This procedure made it necessary to precede multiplication 
automatically by an A— *>Q shift and to follow (again automatically) each 
division by a Q — ►A shift, -The k words stored in the memory. were' always 
chosen in such a way that rthe cycle of four orders brings the accumulator 
A back to its initial state. 

This model machine had many successful runs of several hundred 
hour's each. The average lifetime • of transistors of the above type, extra- 
polated from the failure rate in these runs, turned. out to be about 100,000 
hours, as compared to less than 20,000 hours for tubes in ILLIAC. 



(2) W. J. Poppelbaum, F. M. Lurie, G. A. Metze: "TRANCE, A Direct-Coupled, 
Asynchronous Computer-Model Using One Microsecond Transistor Circuits", 
Digital Computer Laboratory Report No. 73> December 3, 1956. 
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Two points should "be mentioned:. First, the fact that a given 
transistorized logical circuit may use as many as twice the number of active 
elements the corresponding tube circuit would use, makes a. 1:2 ratio of 
tube lifetime / transistor lifetime necessary in order to break even. From 
the quoted figures it is seen that the model was only 2. 5. times as reliable 
as a tube model doing the same job. Second, the cited figures do not refer 
to the presently used GF-4-5011 transistors (see section .4.12). Preliminary 
investigations show, however, that the latter type stands up well under 
actual operating conditions . 



4„7 Tolerances, Critical Levels and Discrimination Levels 

Figure 4.2 shows a- "single input-output" flipflop. The operation 
will be discussed for the strongly idealized case where both transistors 
have, a - 1 and where there is no voltage drop between emitter and base 
when a transistor conducts. Furthermore it will be supposed that the 
output impedance R Q is very small compared to the input impedance. 

-£ 



r 



> R* 



A 



OUT 



R, 



w 




Figure 4.2 
Single. Input-Output Flipflop 
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The figure shows the nomenclature used. By definition the flipflop 
is in the state when transistor T^ conducts and in the 1 state when 
conducts. As long as no input signals are impressed the circuit (for the 
two states respectively) can he redrawn as in ..Figure 4.3. It is very easy 



'0" State Circuit 



'1" State r Circuit 



I. 



-e 
A 



4-v — 



T 



S a XL 



Je 



■ (e 



-e 



p. 



. Figure k . 3 

Active Parts of a Single Input-Output Flipflop 

to determine the voltage u, v, w at the three "cardinal points", base of T^, 
OUT, and collector of ^ for both states : 



For the 
state 



< 



U.= :V 



R l + R 2 



- E 



ft. 



E 
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For the 
1 state 



u = V 



v = 



/u(R .+ R ) + R \ 

5 = v ( ^ j 



E 



/R 2 + * 3 



where ^ is the sum : R^ + R^ . + ..R^ a nd u is the quotient R-^/Rq. 

Consider the output voltage v: in the 1 state it is perfectly- 
fixed (at least in the idealized case treated here); for the state how- 
ever ..v can take any value "between two limits due to the fact that the 
power supplies and resistors vary - "both from one unit to the next and in 
time: all circuit parameters • have certain tolerances . If R is the "design 
center value" of a resistor with a fractional variation of x, in 
extreme cases R max = R(l + x), R min = R(l - x). In the case E V for 
instance 



v min = - E max 



R^max 



R 1 min + R 2 min + R^ max 



v max = - E min 



R^ min 



R-^ max .+ R^ max + / R^ min 



. -Suppose now that the flipflop is to be set to a new state' -by . 

applying an input signal S through a resistance r (representing for instance 

the forward drop in a diode used to transfer the information in a. gate). If it 

is in the state, it will be necessary to apply a signal-.S, which is sufficient 

1 . Rl R 3 

to cut T_ off: if the circuit has sufficient amplification (i.e. -a = ~ > 1 

2. Tq 

where r g ■= emitter-base resistance), the flipflop will then proceed on its own 
toward the new state. -S^ is determined by the condition that in the left- 
hand side of Figure U„3, u .= 0. Similarly the signal. Sq required to set 
the flipflop to zero is determined by the condition that in the right- 
hand side of. .'Figure U.3,u.= 0.. This gives, 
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/ r(R + ,R 2+ .^) / r , 

s o ~ -\ r 3 (r 1 + r 2 ) ) + E (.R! + r 2 J 

Visibly S Q S 1 as ■ long as r / 0: ; this means that the flipflop shows hysteresis. 
Again the fact that tolerances are not zero means : that there are bands (S-^max, 
S.^ min) and (S Q max, • S Q min) in which the input signals necessary to trigger a 
flipflop must lie. Since : > -Sq under practical conditions , -it will be 
sufficient to assure that S >'-S^ max to trigger the 1 state and that S <'S Q min 
to trigger the state. 

To summarize: in order to assure triggering in a . chain. of flipflops, 
the output v of each one must swing under the worst conditions sufficiently 
far to trigger the next stage. This corresponds to the following inequalities : 

v min > max (here v min = v = 0) 
v max < Sq min. 

S.^ max . will be called the .upper critical level and S Q min the lower critical 
level Cq,, Triggering is produced by overswinging these critical levels. 

By reasoning In a similar fashion discrimination levels d^ and d^ 
could.be determined such that a signal in the band (d Q d^) cannot possibly 
trigger any flipflop of the given set. • Generalizing the arguments it is 
seen that there are output bands (v max,, v min) "1" and (v max, v min) ->|-"0" 
as well as. input bands. (c Q c^) . outside of which triggering is achieved with 
probability one and (d ij) inside of which no triggering can occur at all. 
If the circuit is well designed the general disposition of the bands is as 
in Figure 
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JMPUT 



OUTlPl/T _ 

VMA*. 



Ybaud -^/////// '_ 



j 

3A/UJ23 




■ OAAJGeZOUZ 6AAJD 



/UOAJ- T£-I66£.# 3A7UD 



DA/U&E/ZOUS BAAJD 



C a 



C/ZITICAL LEV£t-£> 



"O'BAUD -V/////// 



-WMA7L. 



1TMIU. 



■ Figure k.k 

Input and Output Levels of a Single Input-Output Flipflop 

k.Q The Last Moving; Point Philosophy 

Up to now only the static behavior of circuits has been considered] 
as soon as rapidly varying signals are taken into account, matters are compli- 
cated by the fact that the purely resistive divider networks described in 
the last section become combinations of resistances and capacitances, the 
latter being due to the stray capacitance between elements and between elements 
and ground- Consider for instance a divider of ratio k - R„/(R :+ F») as .in 



(SLOW VARlAT/Ol/S) 

(.fast \/ARi*-ric>us) 

Figure h „ 5 
Voltage Divider with Stray Capacity 
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Figure k. 5 > the res is tor. R^ haying between its ends a. capacity S^.. Then 
the output voltage becomes equal to the input voltage u whenever u varies 
fast enough. In other words: the dynamic behavior of a f lipf lop. (or any ■ 
other circuit) is different from its static behavior . 

More trouble . still is due to the fact that aciive elements like 
tubes and transistors behave in a way different from that predicted from 
static characteristics :when they are used for rapidly varying ;signals. Take 
a transistor as in Figure k„6 and examine what happens to the emitter and 
to the collector when the input changes suddenly from u 1 to (both 
being in the conduction range) To do this three things must be noted: 

(1) As long as the emitter conducts,. there is apparently present 
a large capacitance Cq, between it and the base. 

(2) .When a sudden change occurs in the emitter current (due to 

an impressed signal) this change is propagated through the 
base with a certain delay. 

(3) , The stray capacitance C between collector and ground absorbs 

a large amount of current and the collector voltage varies 
far from instantaneously even when the signal has reached" 
the collector junction. 




s/tse ^jfivo?>/ coU&cra/z /esse t/mz (free? &>u. mme) 
Figure h.6 

Base Delay and Collector Rise Time in a Transistor 



(3) This hypothesis eliminates difficulties due to the Miller effect 
caused. by the capacitance between collector and base. 
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The influence of these three factors is shown graphically in the figure 
on the preceding page. It is seen in particular that the emitter follows 
the base instantaneously. If the incoming signal varies so rapidly that 
the collector has no time to react before it goes. back to its old value, 
the emitter will indicate that the transistor has reacted to the signal while 
in reality the collector has not had the time to follow. 

Now re-examine the flipflop in the last section. Suppose that, 
while it is in the 1- state (T^ conducting) S is suddenly lowered. As .soon 
as Tq starts conducting (i.e. without delay if S varies infinitely fast), 
the output will follow the input. If this flipflop is used to set another 
flipflop,. it may do so before the amplifying action of T allows the first 
flipflop to reach stability, i.e. before it "holds": the new output thus 
does not guarantee the new state. This of course is serious in asynchronous 
circuitry because the outputs are used to initiate new operations. 

The circuit in Figure k.J avoids (at least to a certain degree) 
the above difficulty by taking the output from a second voltage divider 
which- indicates the new state of the collector of the amplifier rather 
than that of the emitter. Such a point, which by its new output gives 
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Figure k.'J 
Last Moving Point Flipflop 
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proof ' of the Hew state of a circuit is termed a. last moving point . Note 
however that even here there is a. "feed through path" due. to the fact that 
the input and the output voltage dividers are tied to the same collector: • 
by designing these dividers correctly one can however - "decouple" IN. and OUT 
to such an extent that the output never reaches the dangerous bands of 
Figure K£' by feed- through action alone- 

Note that in a symmetric flipflop, which has complementary sensing , 
Outputs and complementary inputs, last moving points exist only in a restricted 
sense. A given output terminal is generally a last moving point only for .a 
given input and for one direction of signal change. This input and this 
direction of change has to be. specified in each case. 

4.9 Characteristic Times : Switching Time and Operation Time 

Synchronous circuits are often defined speedwise by specifying 
the frequency at which they operate. In multiphase systems this does not 
mean that the time in which information traverses a given logical element 
is equal to the reciprocal of this frequency. 

In asynchronous circuits it is obviously impractical to talk about 
a clock frequency. The first idea is to introduce the rise or fall time of 
the signals, but as explained in the last section, this does not take account 
of the delay in the circuit before the signal begins to change. Also, and 
this is of the greatest importance, it is useless even to talk about times 
as long as no last moving points have been provided or determined. There- 
fore : chara^tejrjlsjbic times will only be introduced for last moving point 
circuits. This does not mean that for a non-last moving point circuit 
one cannot, with caution, give a time after which it has reacted to an in- 
coming .signal. In this case it will be necessary to talk about the 
'' estimated reaction time ". 

The first important . characteristic time is the switching time of 
a circuit. This is the time which elapses between the moment when an 



(k) It can be argued whether this proof is sufficient: it seems extremely 
difficult to design elements the last moving point of which indicates 
positively that the new state has been. reached. . 
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input signal in the form of a step function (and coming from a zero impedance 
generator) reaches the critical level, and the moment when the output 
signal has gone as far as the critical level (of the following circuit) 
toward its new binary value. By the last section this guarantees switching 
of the next storage element. 

The operation time is the second important characteristic time. 
This is the time which elapses between the moment when an input signal with 
the rise time characteristic of the type of circuit used (and coming from a 
generator of non-zero impedance in the form of precisely such a circuit) 
reaches the critical level, and the . moment when the output signal has gone 
as far as the critical level toward its new binary value. 

To clarify the important notion of operation time, the case of a 
particular circuit will be discussed. Assume that there is a set of 
absolutely identical last moving point flipflops, each one having identical 
output levels v and v and each one requiring triggering signals and Sq 
respectively (this corresponds to making c^ - d^ = -S^ and c^ = d^ = Sq in 
Figure- k,k; the discrimination levels will therefore be called S-^ and S^') . 
Consider how the output of one of these flipflops reacts when its input 
goes from v to v. Figure U.8 gives the timing diagrams. It is seen that 
the operation time is the total time which elapses between the moment when 
the input starts to change (with a slope equal to that which a typical 
collector gives when the input has precisely this slope) and when the out- 
put reaches the value necessary to trigger another stage. This time is 
composed of a "base delay" and a percentage of the collector rise time. 
Note that, thus defined the operation time varies from one set of identical 
circuits to another (different) set; it also can depend on the. sense of the 
signal change, i.e. the — >1 operation time is not necessarily equal to 
the 1 — ► operation time. For a given circuit design involving tolerances 
it is usual to quote the average operation time for both ways, obtained 
by connecting a number n of flipflops in a loop closed by a NOT circuit; 
the period T of the oscillations setting in is then 2nf + 2"^' where T* is 
the average operation time of the NOT circuit. If the latter is unknown, 

(5) The "critical level" has been defined for flipflops in the preceding 
•section- For other elements it is defined as the level which' turns 
over a flipf lop following this element. 
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we can measure 2(n+l)Z" + 22"' by adding one more flipflop and finding "2T by 
subtraction. 

It should be remarked that the advantages of the average operation 
time are: (l) it refers to a circuit running under non-idealized conditions 
(i.e. not driven by a zero- impedance generator with zero rise-time), (2) it 
is an additive property for all circuits having the same output swing and 
the same discrimination levels (supposing that input impedances are high).; 

4.10 Flow-Gating 

In some of the newer circuit designs a new principle of information 
transfer is used which will be called ."flow- gating" for short. The fundamental 
idea involved will be explained in a simple case. Consider a S.chmitt trigger 
circuit using a single- supply voltage only. The input and output of such a 
circuit' (which can be in either one of its states) can then exist at voltage 
levels .which depend- linearly on the supply voltage. In order to gate from 
one such circuit to another, a simple diode connection is used and the 
average potential of the two circuits is made different in such a way that 
information flows through the diodes . It turns out that the clearing, which 
has to precede the setting,- can be accomplished automatically in. the process 
of changing the average potential. Since information is gated by making it 
flow down a potential gradient (created by a gate signal which controls the 
supply voltage), the system is called "flow-gating". 

■Figure 4. 9 gives the circuit diagram of the device. 

The theory is as follows. For a fixed -E, T^ acts like a grounded- 
base amplifier (base return voltage = : -ER^ / (R^ + R,-) if a ' =. l) and ^ acts 
•like an emitter follower. The fact that the output signal is taken from the 
collector of an: emitter follower does not impair its function in the flip- 
flop. Note that OUT is in phase with the base of Tq which is used as a 
triggering point. It is seen therefore that T^ and T^ together act like a 
Schmitt trigger. The base of T^ is tied to a bias -u^ through a diode and 
the circuit values are chosen in such a way that OUT is above -u^ in the 
1 state and below in the state. (Note that the left-hand diode does. not 
conduct as long as the supply voltage is -E„) 
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Figure ^o9 
Layout of a Flow-Gating Flipf lop 

Suppose now that we connect several flipf lops of the kind just 
described in a linear chain (see Figure 4.10), using diodes with their 
cathodes tied to II, As long. as the three. supply voltages -E^ -E 2 and 




Figure k „ 10 
Gating with a Flow-Gating Flipf lop 
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-E^ are equal to their normal value -E, the output is so much more negative 
("both in the state and in the 1 state) that all diodes are .cut off allowing 
each flipflop to retain its state uninfluenced by neighboring units. 

Wow lower the supply of FF^ from -E to -E' . This is certainly 
not going to influence FF^ to the left since the connecting diode is cut 
off by an even greater margin than "before. INg, however, will become more 
and more negative in this process and the moment will come (this depends 
on the judicious choice of -E') when its average value (determined by 
-E', and R,_) is equal to -Uq, the bias applied to the opposite base 
through a diode. As mentioned before, circuit values are chosen such 
that -OUT is above -u^ in the 1 state, and below in the state for a 
supply voltage -E. This means that ifOUT^ indicates a 1, T^ is switched 
off, while if it indicates a 0, the bias diode switches T-^ of f (i.e., 'Iq on). 
Once this operation is accomplished, the. supply voltage is brought back to 
its normal value, -E. One can easily verify that during this transition, 
the state impressed at -E ! is conserved and thus "trapped" when all supply 
voltages are equal again. 

It turns out that the collector supply of T^ -need not be tied 
to -E. This not only diminishes the current requirements for gating by 
a considerable factor, but also allows . us to control the gating out of a 
flipflop without modifying the supply voltage of the flipflop to be gated 
into. It can also be shown that by introducing a bumping diode into the 
collector of.T^, we can have a constant output from a flipflop (whether 
it is in' the or the 1 state) independently of whether.it is. being gated . 
into or whether it is in its normal supply- voltage range. 

The/property described in the last section permits the use of flow- 
gating flipflops in some interesting arrangements. Figure *K 11 shows a possible 
realization of a binary counter stage. It counts individual. up-and-down se- 
quences. The idea is to store in four . flipflops* connected in a. ring, the 
pattern 0011. The supplies of opposite flipflops are tied together, and a NOT 
circuit feeds one of the supply busses> the . other one being connected directly 
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to the counter input (except for some amplification). For each 0-1-0 
change at the input, the 0011 pattern is shifted cyclically. Looking at any 
one of the flipflop outputs, we obtain thus one - 1 .- change for' two 
0-1-0 changes at the input to the counter. 




Figure h . 11 
A Flow-Gating Binary Counter Stage 
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4.11 Principles of Circuit Analysis and Synthesis 

In section 4.7 it was seen that satisfactory circuit operation 
demands- that certain inequalities between functions of circuit parameters 
(supply voltages, resistors, transistor parameters) be satisfied under 
"worst case" conditions. In general, these inequalities are quite cumber- 
some to handle, especially when nonlinear relationships occur: emitter- 
base voltage as a function of emitter current, or a as a function of 
emitter current (if this is high) are examples of nonlinear relationships. 
The procedure of finding a successful circuit requires that the calculation 
be made many times. Under. these circumstances machine calculation by Illiac 
is most useful and a certain number of routines have been written to 
this end (Problem Specifications 8l9, 982,. 879, 928 and IO87 see 
Appendix) . 

The input parameters which the routines use in their calculations 

are those over which the designer has some control. Direct control is 

(7) 

available over resistor values N , power supply voltages, and load current. 

•Direct control is. also available to the designer over transistor .character- 
istics and power supply and resistor tolerances, but only over a. limited 
.range of variation, this range being determined by currently available 
hardware. Voltage drop values are also entered as input parameters since 
these are specified after the type of diode or transistor is decided upon. 

. There must be enough input parameters to completely specify the circuit, 
however, since these routines are analytic programs. 

It should be mentioned that all five programs calculate only the 
static dc conditions existing in a circuit. Going beyond this would mean 
a considerably longer program and would also necessitate a very detailed 
study of high-frequency transistor behavior. Therefore, the pulse response 
of a circuit is mostly studied on an experimental basis. 



(6) G. H. Leichner: ■ "Designing Computer Circuits with a .Computer", 
Journal of the A.C.M.j, April, 1957. 

(7) Except that sometimes the -standard RETMA values have to be respected. 
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Evidently, the analytic programs mentioned . should be supplemented 
by synthesizing programs. This would allow the reduction of the number of 
input parameters to the minimum dictated by the choice of hardware . There 
would then follow a search for an optimum solution within the remaining 
parameter space. The question, "What is the optimum solution?" is answered 
as follows: Let there by M circuit parameters p^ (i = -1, . . .,M) representing 
resistances, transistor alphas etc. The satisfactory operation can then be 
expressed by inequalities between certain functions of the p^'s. F^. being 
numerical constants, it is always possible to write these circuit inequalities 
in the -form F. > f .(... p. ». .) (F.. fixed, j = 1, . . . , IT). Note that 'N 
and M are generally of .the same order of magnitude. To these inequalities 
one can add a small number of inequalities (identical in principle) with 
variable left-hand sides, . the parameters being called G fc and .the corresponding 
g^'s having the property that they shou.ld .be maximized (or minimized: this 
reduces to the former case). An example is the supply voltage swing in flow^- 
gating. This gives the system G. > g, ( . . . • p. . . « ) (G, variable, 

K. .K. 1 K 

k = 1, . . .., n). Note that in general N > > n. 

Let it^ be the best commercially-available tolerance on p^.and X 

a "quality parameter" which should, be maximized for the optimum. circuit and 

which multiplies all tolerances. Operationally, this means that we search . 

for the set p. ' which makes \ maximum in 
i 

F. > f .(.„. p. ' + p. ' it. X ...) 
J - J i - i i 

G k - s k^ * * ° p i ' - P i ' n i ^ ° 9 ° ^ 

the signs being chosen in each case in such a way that f . org, increases. 

J K 

It is then evident that X found in the above process is a 

.max 

function of all G, Vs. - If X < 1 over the whole range of admissible G 's, 
no solution exists: even the best components do not guarantee satisfactory 
operation under all drift conditions „ -Otherwise, in view of the small value 
of n, it is easy to decide at the , end which set of the G 's (with X 2. l) 
satisfies best given engineering requirements. 
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Steps in the direction mentioned have. teen taken in some cal- 
culations, e.g., the "flow- gating" flipflop discussed in the last section. 
The above method was, however, only approximated by working with a relaxr- 
ation net and by linearizing all inequalities. 



k.12 Component Specifications 

The choice of components was made after considerable search, 
especially in regard to diodes and transistors . The. most promising triode 
transistor appears to be the Western Electric GF-^5011. This transistor, 
a drift type in which the necessary graded-base is obtained by a diffusion 
process, was designed to meet switching .circuit requirements . suggested in 
part by the. Digital Computer Laboratory. All elements are considerably 
derated in the circuits. 

Resistors : any commercially available 1$ deposited . carbon resistors 
offering good aging and temperature stability. Derating : 50$ for power, 
a factor of 3 for tolerance (i.e., it is. assumed that these resistors are 
at all times within 3$ of their nominal value ) . 

Power Supplies : short and long time stability better than I'fo for 
all line and load variations. Derating : a. factor of 3 for tolerance 
(i.e., the supply is assumed to .be at all times within' 3$ of the nominal 
voltage). 

Transistors : -Western Electric, type GF-45011. 

V™ forward (I =10 ma, V - -4v) O.k + O.lv 

. EB e ' c — 

VL- reverse (I = -100 u.a, collector open) greater than i-.O, volts 
EB e 

V_,_ reverse (i . = . -100 \ia, emitter open) greater than 25 volts 
LB . . c 

h fe (l e =:10 ma, V c = : -10v, f .= 100 mc), greater than 7.2 db 
current gain, grounded emitter 

rJ; (I = 10 ma, V = .-10v, f .= .250 mc) . Less than 100 ohms 

b e c 
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ol^ (I e = 10 wa,-Y c =..-kv) 



greater than 0.95 



C . (V = -10v) (measured without header) less, than 2.0 wif 

c c . 

Collector dissipation^ 25° C . greater than 200 mw 

Derating : .Collector dissipation at 25° C assumed to be less than 150 mw 
..a^ assumed between ..93 and 1.00 for a first class of. circuits, 
assumed between. 98 and 1.00 for a second class . The 
emitter-base junction dissipation is held to about 10 mw. 

Diodes : :Qutronics, types Q5^250 and Q10-600. 

.. Specifications for type Q5-250: 
: 1. Static characteristics : 

a) . For a current of 5 milliamperes in :the forward direction, 

the voltage across the.diode shall.be 0.35. +.0.02 volt. 

b) .For a current of 10 milliamperes in. the forward direction, 

the voltage across . the diode shall.be 0.^0+0.025 volt. 

c) ,Fora current of . 100 microamperes in the reverse direction, 

the voltage' across .the diode shall be greater .than 5 volts. 

2. ■ Transient -characteristics : 

a) From being initially biased 5 volts in the reverse direction, 

each diode shall. switch so as to conduct 10 milliamperes in 

-9 

the forward direction in not more than 15 • 10 second. • The 
circuit resistance for this .test shall be 100 ohms. 

b) . From initially conducting 10 milliamperes in the forward 
..direction, the diode shall be switched so that it is biased 

-9 

5 volts in the reverse direction. • In not more than 10 • 10 

seconds after switching to the specif ie'd reverse bias, the 

current in the reverse direction shall not exceed .1,5 milli- 

-9 

amperes. Within 80 ° .10 seconds after switching, .the 
current shall be less than 250 .taicroamperes, and within 
200 ° 10"^ seconds the current shall be less than .120 micrd-r 
amperes. The circuit resistance . for this. test shall be 200 ohms. 
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3. Each diode shall he capahle of dissipating 100 milliwatts at an 
an ambient temperature of 25° C. 

k. The shunt capacity of each diode, when measured with a reverse 
bias of 3 volts, shall not exceed 0.5 micromicrofarad. 

5. The physical dimensions of each diode shall be 0.25 + 0.05 inch 
in length, exclusive of leads, and shall be approximately 0.1 
inch in diameter. 

6. Each diode shall be hermetically sealed. 
Derating : None. 

Specifications for type Q10-600: 

1. Static characteristics: 

a) For a current of 5 milliamperes ■ in the forward direction, 
the voltage across the diode shall be O.37.+ 0.025 volt. 

b) For a current of 10 milliamperes in the forward direction, 
the voltage across the diode shall .be' 0.U2 + 0.03 volt. 

. c) For a current of 100 microamperes in the reverse direction, 
the voltage across the diode shall be greater than 10 volts. 

2. Transient characteristics: 

a) From being initially biased 5 volts in the reverse direction, 

each diode shall switch so as to conduct 10 milliamperes in 

-9 

the forward direction in not more than 15 • 10 second. The 
circuit resistance for this test shall be 100 ohms. 

b) From initially conducting 10 milliamperes in the forward 
direction, . the diode. shall be switched so that it is biased 

-9 

5 volts in the reverse direction. In, not more than 10 • 10 
second, after switching to the specified reverse bias, the 
current in the reverse direction shall not exceed 2 milli- 
amperes. Within .80 • 10 ^ second after switching, the 

current shall.be less' than .600 microamperes,, and within 
-9 

200 ° 10 second the current shall be less, than 120 micro- 
amperes. The circuit resistance for this test shall be 
2000 ohms. 
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3» 'Each diode. shall "be capable of dissipating 100 milliwatts at an 
ambient temperature of 25° Co 

.k. The shunt capacity of each diode, when measured with a. reverse bias 
of 3 volts ,. shall not exceed 0.5 micromicrofarad. 

5. -The .physical dimensions of each diode shall be 0.25 +.0.05 inch in 
lengthy exclusive . of leads , and shall be approximately 0.1 inch in 
diameter. 

60 -Each diode shall be hermetically sealed. 
Derating : None . 

A. 13 General Remarks about the Next Sections 

In the next eleven sections some . of the. circuits which could be 
incorporated in a new very high-speed computer -will be shown by way of 
example. It is by no means certain that these circuits will not be improved 
in the near future, either by using transistors satisfying tighter 
specifications or by incorporating into them certain features which will 
possibly improve their performance. In this line one could think of gating 
into flipf lops of the Eccles- Jordan type by essentially flow-gating 
principles. Another possibility,- successfully tried recently, is the 
replacement of the Eccles- Jordan flipf lop by what amounts . to the "back- to- 
back" combination of Schmitt .- trigger's (the emitters and bases of . the 
amplifiers being cros s- coupled ) . 

•It should be noted that efforts have been .made to find resistor 
values available as RETMA. standard values. 

•For convenience sake,, the use of positive logic will be assumed; 
i.e. nominally 

"1" corresponds . to +1.6v 
"0" corresponds to -1.6v. 
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Note that these voltages are measured at the output of emitter- followers 
■in most cases. The bumps are -2.0v and +1.2v to take account of the 
(average) emitter-base drop of .kv. 

The ordering principle for the rest of this chapter is the tolerance 
of circuits. If no complete tolerance analysis has been made, the circuit is 
automatically relegated to a later section. Four classes of circuits can be 
distinguished: 

. Class 1 : Designed according to. section 4 „ 11 and using the . component 
specification's of 4.12. 

Class 2 : Designed according to section 4.11, but using tighter 

specifications: .98 $ a^, $ 1.00 and a resistor tolerance 
of 2°lo only. Non-standard output levels. 

Class 3 - Designed by procedures similar to those of section 4.11 

but not quite as exact; in particular no table of emitter- 
base drops was stored with the program. Component specifica- 
tions of 4.12, except for resistors, which have only 2$ 
tolerance. 

Class 4 : Designed by diverse procedures of varying . exactitude. The 
component specifications are those of Class 2. 



4.l4 NOT Circuit (Class l) 

As Figure 4.12 shows, the NOT circuit is ■ a constant-current-emitter 
inverter with a diode clamp holding the emitter down in order to allow 
switching. A bumped stepdown network drives an output emitter- follower. 
For the nominal signals, the input current does not exceed l.ma; the maxi- 
mum output current is 3 ma. A collector bump prevents . saturation. 

. The' figure also quotes the -more exact relationship between voltages 
and currents for both the input and for the output, since the circuit can 
work satisfactorily with non- nominal inputs. . This leads incidentally to 
the use of NOT circuits as LEVEL RESTORERS when inversion can be tolerated. 
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Figure U.12 
15 Millimicrosecond NOT Circuit 

All resistors and power supply voltages within + 39 
.93 < < 1.0 

Maximum Input Current: I. =.-.829 + .033 V. 



of nominal 



I. 

1 



Undefined 



1^0 



Output Voltage: 

Input Voltage: 
Operation Time: 



-2.08 < U < 1.63v 
+1.51 < U < +2.02v 
U < -1.5v 



-2.5 - V. < 

— ' 1 — 



•.5v 



+.hl < Y 1 < +2.5v 

< 15 mus 



V. < -.5v 

. 1 — ■ 

-.5 < V. < +.hlv 
— 1 — 

V i > +.Ulv 



Input "Zero" 



at 1^ = 3 ™ a 



Input "One" 
Input Zero at 1^ = 5 ma 

"Zero" (Positive Logic) 
"One" (Positive Logic) 
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4.15 Level Restorer (Class l) 

The circuit in Figure 4.13 is essentially the one described in 
the last section. Here, however, the inverter is replaced by a (non- 
inverting) grounded-base amplifier and this in turn necessitates the 
input emitter-follower. Note that no diode has to be used in the emitter 
of Transistor T^. 

The comments and notes of the last section concerning currents 
and voltages apply equally well to the circuit under consideration. 
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Figure 4.13 
15 Millimicrosecond Level Restorer 
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All resistors and power supply voltages within +. % of nominal. 
• 93 < a DC < 1.0 



Maximum Input Current: I i = -.868 + .03^2 V 

I. Undefined 
l 



l 

Output Voltage: -2 .08 < U < -1.63v 

+1.51 < U < + 1.97V 
U < -1.5v 



V i < -.5v 

-.5 < v < + .5v 

V. < + -5v 
1 — 



Input "One" 
Input "Zero" 
Input "One" at I. 



at I =3 ma 

L 



5 ma 



Input Voltage: 



-2.5v < V. < --5v 
— 1 — 

+ .5 < V. < + 2.5v 
— 1 — 



"Zero" • (Positive Logic) 
"One" (Positive Logic) 



Operation Times 



< 15 mus 
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h.l6 OR Circuit (Class l) 

The OR circuit of .Figure k.lk is a (triple) diode OR for positive 
logic, each diode being driven, by a separate .emitter-follower . Note that 
there is one "output-determining" transistor: the one corresponding to. the 
most positive input. 

In designing .an OR circuit,, it is important .to consider, the dc 
level change or. "deviation" which a signal suffers in its transfer from 
input to output. The signal to be considered is precisely the one which 
determines the output voltage. A very meaningful measure of this deviation 
is the number of similar circuits allowed in cascade. The table below 
presents a compilation of input voltages, deviation and output . voltage . for 
each successive stage in a chain of OR circuits. . The first stage is assumed 
to be driven by a slightly degenerated signal of -1.500v coming from a level- 
restorer or a NOT circuit. Since a NOT circuit can restore a negative 
signal which is only -0.55v, it is seen that 5 OR' s can be cascaded before 
going into another NOT or a level- restorer. Note that the emitter-base 
drops and the diode drops cancel more or less under average conditions. 

. Table for Cascaded OR Circuits 



Stage Input Max. Deviation Output 

1 ^1.500 .176 -1.324 

2 . -1.324 .174 -I.I50 

3 -I.15O .171 -O.979 

4 -0.979 .168 -0.811 

5 -0.811 .165 -0.646 



This shows, that for negative swings, 5-OR's.can be cascaded. For positive 
swings., conditions are similar. 
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-4V -Z^V -4V -4V 




Output 
Figure k .Ik 
5 Millimicrosecond OR Circuit 



All resistors and- power supply voltages within 3$ of nominal. 
•93 — ^DC — 1 '° 

Maximum Input Current to Output-Determining Transistor : 

I. = „63U + .064 I T V. > -2v 

l J L l — 

Output Voltage: Max. -Values: U = +.101 + .983 V. +. .026. I T 

- Min.. Values: U = -.076 + .99 V + .013 I L 

For -2 < V. < 2v and < I T < 3 roa vith I T in milliamperes 

Maximum Input Current to Non-Output-Determining Transistor: 
I L = 1.1 ma V > -2v 

Total number of circuits allowed in cascade (each driving +2 ma ■ load, . 
standardized initial input ) : 5 

Operation Time : < .5 mus 
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h . 17 AMD Circuit (Class l) 

The AND circuit of Figure 4.15 is .straightforward since pnp 
transistors are used. Note however the voltage divider in. the output 
which compensates for the average emitter-base :drop. Here again there 
is an "output-determining" transistor: the one corresponding to the most 
.negative input. Cascading questions can be treated in a way similar to 
that of the last section. $ 



-4V -4V -4V 




Figure 4.15". 

5 Millimicrosecond AND Circuit 
(Positive Logic) 

All resistors and power. supply voltages within + 3$ of nominal. 
.93 < (fa < 1.0 

Maximum Input Current to Output-Determining Transistor: 
I. = .-.691 - .068 I T V. > -2v 

Output Voltage: Max.. Values: U = + .031 .+. .969 V. + ..066.I- 

.1 ■ i-» 

Min. Values: U = -- .173 +, -978 V ± + . ..057 I L 

For -2 < V. < +2v and < 1 T < .3 ma with I T in milliamperes 

' — X — — -.Li — -Jj 

Total number of.. circuits allowed in cascade (each driving +2 ma load, 
standardized initial input) : 5 

Operation Time: Z 5 ..mus 
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k.l8 Flow-Gating Flipflop (Class 2) 

. The design principles of this type of circuit having been discussed 
at length in section k.10, very few comments will suffice . The first one is 
to draw attention to the fact that resistors have to he of the 2$ variety 
and that .98 < < 1; the second one is that two non-standard diodes are 
used, the type .number being GA4-100 (Hughes). These diodes . can withstand 
higher reverse voltages than the standard Qutrpnics types. 

Note the presence in the output collector of a bumping diode to 
prevent saturation when the supply voltage is at its normal level and 
also to stabilize the output. The latter is -lOv or -20v, depending on 
the state, i.e. as designed, the levels are not the standard + 1.6v levels. 



Out 
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Figure ^.16 
50 Millimicrosecond Flow-Gating Flipflop 

Power supplies within 3$ of nominal value. 
Resistors within 2% of nominal value. 
. 9 8<Q^ C <1 

Maximum Input Current: .2.75 ma 

Minimum Output Current: 3° 75 ma 

Output Voltages: -lOv and -20v (nominal). 

Supply and Gating Voltage: -20v (normal) -U5v (gate-in) 

Operation Time: < 50 niM-s 
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k.19 Schmitt- Trigger (Class 3) 

Figure 4.17. below. gives circuit values for a. last moving point 
Schmitt trigger flipflop. Note the presence of collector bumps, tied to 
-Uv and also the fact that the stepdown network in the regenerative loop 
has a much higher . impedance than that going into the output emitter- follower . 
■The input is tied to two separate .diodes in order to be able to gate in 
both directions, . the average gating . current (for. either case) being about 
2.5 ma. With an output driving capacity of about 5 ma this gives a fan- 
out of 2 for the circuit.. cThe operation time is approximately 15 mus. 



o 




oo ur 



Figure 4.17 
15 Millimicrosecond Schmitt Trigger Flipflop 

(All diodes are. Qutronics Q5-250) 
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4.20 Eccles- Jordan (Class 3) 

Figure 4.18 gives the circuit values for a last, moving point 
Eccles- Jordan flipflopo Again collector "bumps are used, but this time (since 
there is no longer a grounded-base amplifier in the circuit), supplementary 
bumps are used at the bases of the .amplifiers and also- from the common- 
emitter to ground: This is necessary in order to respect the reverse 
specifications for the emitter-base junctions. Otherwise the circuit is 
much like two cross coupled Schmitt- Triggers — although the stepdown net- 
works are- much more similar in impedance; this is possible because double 
gating eliminates some of the awkward conditions encountered in a single- 
ended flipflop. Gating . currents and output currents are, however, very 
similar to those mentioned in section 4.19 as is the fan-out of this cir- ' 
cuit (2). The operation time with the direction of gating indicated is 
about 30 mus . 

4.21 EXCLUSIVE OR (Class 4) 

The. principle of the EXCLUSIVE OR circuit of Figure 4.19 is well 
known: The input signals are applied to two transistors with a common 
collector, load, the bases and emitters being crosscoupled. Then the 
common collector reacts only when unlike signals are received, at least 
as long as the swings applied to T^ and T^ are equal. . The latter con- 
dition necessitates the re- standardization of the incoming signals: This 
is achieved by the collector bumps for T^ and T^. -Note that the output 
•is taken from an emitter-follower, the dc. stepdown being on the emitter 
side. The circuit operates with normal + 1.6v signals and the operation 
time is about 50 mus. The fact that one can obtain an .EXCLUSIVE- OR 
operation by the common AND- NOT- AND- OR combination in about 20 mu-S is 
less relevant when the number of transistors and diodes is compared: 
This circuit uses only 5 transistors vs. 8 for :the combination. However, 
if the of ten" required AND function has to. be. obtained too, the combination 
is probably more attractive: its use has therefore been assumed in this 
report. 
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Figure U.l8 
30 Millimicroseconds Eccles -Jordan Flipflop 

(GATE = +1.2v gating, GATE = . -2v non-gating, 
All diodes in collectors are Q10-600, all others Q5-250. 
All resistors are 1/2 w, 2$. ) 
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Figure 4.19 
50 Millimicrosecond EXCLUSIVE OR 
(Positive Logic) 



4.22 Driver (Class 4) 

The diode gates described in the section on Eccles- Jordan flipf lops 
necessitate large driving currents. Figure 4.20 indicates a driver capable 
of controlling 8 such gates. • This driver is composed of a level-restorer 
type preamplifier and a number of emitter-followers, the last two being in 
parallel. Normal signals are used at .both input and output, i.e. the 
gating voltages of section 4.20 are more than attained. This driver is 
capable of going up and down in less than 50 mus : This corresponds to about 
20 mus operation time. 

Figure 4.21 gives some of the waveforms observed in the gates and 
the flipf lops of a shifting register (see section 4.24), the circuits being 
driven by a two-phase logical oscillator. 
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Figure ^ = 21 



Timing Diagram for the 50 mus Shifting Register—Pattern 0001 
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4.23 C -Element (Class k) 

. Figure k. 22. shows a possible design for a. C or C- element. This/is, 
by definition, an element described by the following equilibrium table: 




depending on 
last state 



depending on 
last state 



TheC-element differs from a C-element only by having the complemented outputs. 

As can be seen, the circuit consists essentially of a. Schmitt- 
Trigger, controlled by a standardizing input amplifier. The center-point 
M between the collectors is "up" when both inputs are "down" and vice-versa. 
If the inputs differ,. M stays at an intermediary voltage such that 
(especially because of the "hysteresis"diode combination H) the flipflop 
part does not react. The output, shown corresponds to a -C-function, but the 
output could also be taken from P with a suitable emitter- follower and a 
dc-stepdown network: This would give the C-function. 



■ 122:- 



574 



138 



•-■2<5V 



3. 3 AT 



/A/l°- 




ooar 



C or C Element 



k.2h Shifting Registers 

For the flipflops discussed in this chapter, gating facilities 
have "been indicated in all cases. However the full extent of the problem 
of transferring information between flipflops has not been covered. This 
section will bring up some aspects of transfer and shifting in view of 
obtaining a reasonable estimate of the number of parts . involved. To 
simplify the discussion only the Eccles-i Jordan flipflop of section ^.20 
will be considered. 

It is easily seen that the changes of state of the circuit are 
produced by providing a positive signal at the base of that one of the 
center transistors which happens to be conducting. For this purpose a 
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positive voltage is injected through a diode from either S or T (See 
Figure 1k18)„ Since positive logic is used, the properties of the flip- 
flop as seen from S and T as inputs are described "by the following 
equilibrium table: 

s :T atm 1 out 2 



depending on 
last state 



1 

Oil 

1 1 

,10 1 

11 not allowed 



The diodes going from 1^ and . GATE to S and from INg and GATE 
to T are operationally equivalent to a (positive) AND circuit. Symbolizing 
the flipflop with S and T as inputs by the usual symbol^ the transfer of 
information between two flipflops corresponds to the arrangement of 
Figure 4„23. For obvious reasons this procedure is called "double-gating": 
There are two possible '-paths for the transfer of information- Considerations 
will be limited to this system since it is inherently faster than the 
"clearing-and-gating" system in which flipflop 2 is set to the zero state 
(zero side output a l)' by a clearing ; signal which precedes the -- conditional 
-- transfer of a one. 




Figure 4 ? 23 

Double Gating 
. -12*v- 
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Shifting in . an asynchronous machine is accomplished by adding to 
the principal register an auxiliary register. The latter is then connected 
hack to the principal register "by AND gates (not shown in- Figure k.2k) 
leading to a flipflop one digit position to the right or one digit position 
to the left. Figure -h. 2h shows a second input for the flipflops of the 
auxiliary register and a second output from the flipflops of the principal 
register: These connections are necessary if the double register is to 
perform any useful function,. Each flipflop therefore necessitates two 
inputs and two outputs „ 




our 



oaj- 



Figure ^.2^ 

. Shifting Register 

Figure ^.25 shows more explicitly what is contained in the 
dotted box indicated in Figure 4.2H. Since information is received 
from two sources,, the flipflop must be preceded by two OR circuits.. 
Practically this can be achieved by adding a second diode from the 
terminals of the second pair of inputs to the bases of the amplifying 
transistors of the Eccles- Jordan flipflop,, 
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Figure V.25 

Circuitry Associated with One Flipflop 

It should he noted that the simple diode AND. gates of Figure 
.4.18 are . sufficient if not more than two other flipf lops are connected 
to any one flipflop. Otherwise the AND's of section 4. 17 must he used; 
a . "fan-out" of the order of .8 can then be obtained. The use, of the 
standard transistor AND's shall be assumed. 

-The component count for all. circuits associated with one : flip- 
flop using the standard AND's as gates and the simplified OR' s described 
above is given below. For completeness 1 sake drivers as described in 
.section 4.22 have been added: One such driver- can take care of 32 AND' s; 
i.e. per flipflop (involving h AND's),. 0.125 drivers are needed. 
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Transistors 


Diodes 


Resistors 


1 flipflop without gates 


k 


11 


13, 


k AMD's 


8 





-•• 12 


OR diodes 





• 2 


.0 


0.125 driver 


0,625 


Oo375 


.1.250 


Total 


12.625 


13.375 


26.250 



For a 52-bit shifting register 10U flipflops with their associated circuitry 
are needed. This implies the use of 1313 transistors, 1381 diodes and 273 
resistors. 

It should be mentioned that diode AMD's can be used under the 
circumstances outlined above: : This would save 832 transistors but the 
diode count would be increased by the same. figure. 



U.25 Summary and Closing Remarks 

This chapter shows conclusively that it is possible to design 
transistorized circuitry for asynchronous dc^-coupled operation with operation 
times in the -5 - 50.mus region. Most of these circuits also satisfy very 
strict tolerance requirements and therefore guarantee excellent reliability. 
Some life tests of units having up to kQ transistors have been very en- 
couraging, as have some experiments concerning noise sensitivity. 

. The following table gives a synopsis of the circuits as regards 
the number of transistors,- ■ the number of diodes and the operation time. 
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No. of No. of Operation 

Circuit Transistors- . . '. Diodes . Time (mp.s) 

NOT 2 h 15 

Level Restorer 3 3 15 

AND 2 5 

OR 2 2 5 

Flow-Gating Flipf lop 2 3 50 

Schmitt Trigger 3 5 15 

Eccles- Jordan 4 15: 30 

EXCLUSIVE OR 5 7 50 

Driver 5 3 20 

C-Element 4 .6 
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APPENDIX 



As was mentioned in section a number of programs have been • 

written which analyze the operation of given circuits under .'given .'conditions-. 
These programs correspond to the following problem specif ications : 

Problem Specif ication 819 - Verify a non- last- moving-point 

Schmitt trigger 

Problem Specification 982 - Verify a non-last-moving-point 

Eccles- Jordan 

Problem Specification 879 - Verify NOT circuit of section h.lh 

Problem Specification 928 - Verify OR and AND circuits of 

sections U. 16. and k.lj 

Problem Specif ication IO87 - Verify Flow-Gating Flipflop of 

section U„l8.„ 

•As an example of the procedure, some details of Problem Specification 
879 are given below : 

The program requires that the values of the circuit components 
and the four values of the input voltage, v, (corresponding to the minimum 
and maximum values of both of the logical values and l) be used as 
program input parameters . . The program then calculates the output voltage 
and other quantities under "worst case" conditions for each of the four 
circuit input voltages. 
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Figure 4.26 
NOT Circuit 

In addition to the. above notation, the following applies : 
: V + := the maximum positive value of the input voltage 



V* = the minimum positive' value of the input voltage 



V ^ the .minimum negative value of the input voltage 



m 



V = the maximum negative value of the input voltage 

I 



,m 



= the minimum value of a = 



a - the maximum value of a 



x 



a = the fractional tolerance on all power supply voltages 



b .= the fractional tolerance on all resistors 
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I T . = the . Maximum value of output load current 

LiX 

I = the maximum collector cut-off current 
co 

= the minimum value of output load current 
P - -t number implies D^ is used; + number implies. D^ is not ,used 

Parameter Tape : The following data may he placed on' a separate tape and 
read in after the main tape: V*>..V*, \> \> \, * 2 > *y F, .-V, E, 

a , a , a. b. I . I T , I T , P.-K. These are to be entered in the order 
m' x' ' ' co Lm' Lx' ' 

listed in standard floating decimal notation. (Note: K must be entered 
but maybe anything if . no diode is used.) 

Operation : The program tape • is read into the computer in the standard 
manner. At the end of this tape, a black switch stop will occur. The 
parameter tape is then placed in the reader and the stop is bypassed. 

Output : The output information is (in addition to a reprint of the para- 
meter tape) : 

u(l) = most negative output of logical "1" 

u(l) + =most positive output of logical "1" 

u"(0) = most negative output of logical "0" 

u(0) + = most positive output of logical "0" 

VCX = maximum collector-to-base voltage 

WCX = maximum collector power dissipation 

IEX = maximum emitter current 

DIODE = printed if diode is employed at collector. 

Note: • The diode is assumed to be perfect. 

The equations to be .solved are as follows: 

F ,- V ■- ■ V. 
1 T = be 

° 6 " H ll( I e ) +R E 
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2 « V be = I e > E 11^J 



3, ' w = 



i _i_ i 

R 1 R 2 R 1 R 3 Vl 



-1^2 ^1^3 ^2^3 

Before the foregoing equations are solved in each of the four cases, 
the various parameters are .multiplied "by tolerance factors ..corresponding to 
"worst case" conditions. 

An iterative procedure is employed to solve equations 1 and 2 
.since H^j..(L) is a nonlinear function which is entered in tabular form. 
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CHAPTER 5 
ASYNCHRONOUS CIRCUITS 



5.1 Introduction 

In Chapter k some general remarks . were made concerning the desira- 
bility of using asynchronous dc-coupled logic in the construction of the 
computer. Here we shall look more closely at the properties of asynchronous 
circuits and assess their advantages and disadvantages from various stand- 
points. Since it is proposed that the computer under discussion he built 
using the asynchronous philosophy, it is important to determine whether or 
not such a design is feasible and what its advantages and cost will be. It 
is further proposed that the behavior of the computer be made independent 
of the relative speeds of its elements whenever this can be done conveniently, 
and special scrutiny will be made of such circuits and their properties. As 
was pointed out in Chapter k, one may attach special importance to a design 
which follows these principles when the circuits under consideration are so 
fast that 'the time required for information to flow from one part of the 
computer to another is comparable to the times required by the logical ele- 
ments. By designing the computer so that its behavior is independent of 
the relative speeds of the elements, one may ignore the problem of matching 
speeds and synchronizing signals to achieve correct operation, thus separating 
the problems of logical design and physical layout. 

5.2 Speed and Complexity 

A given computer may be designed from fewer logical elements if 
the synchronous philosophy is followed, since one need not include the addi- 
tional elements which are required in asynchronous circuits to generate 
completion signals and hold information while it is in transit. This is true, 
in general, since any asynchronous computer could be made to work in the syn- 
chronous mode, but a synchronous computer may contain intolerable "races" if 
it is allowed to run asynchronously. Nevertheless, the situation is not 
entirely one-sided since a synchronous computer must contain the additional 
electronic equipment required by the clock and its gates. 
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It is also true that additional time is required by certain asynchronous 
circuits to form signals which indicate the completion of one operation and which 
initiate the next. As was mentioned in Chapter k such times must be balanced 
against the time which must be wasted by setting the clock period so as to be 
safely longer than the time taken by the slowest-acting combination in a 
synchronous system. The latter time, however, is likely to be longer since the 
clock frequency must be adjusted to the slowest of all the synchronized operations. 
Thus, depending on the method of synchronizing, a shift may take as long as an 
addition. In order to avoid such discrepancies it is necessary to use multiphase 
clocks and a correspondingly more complicated control. No matter how complex the 
synchronous system may be, it will only approximate the speed of the asynchrenous 
system, discounting the time tolerances which are allowed in the synchronous 
system and the time to generate completion signals in the asynchronous system. 
Time tolerances in a synchronous system depend upon the margin of safety which 
one wishes for the system and upon the variation of response times which may be 
expected among the circuit elements. In the case of transistors the variation 
of characteristics is large, and for this reason the time tolerances must be 
taken as correspondingly large. An even greater variation may be expected among 
elements which have been aged for some time in the computer, and it is with . 
respect to this latter variation that the time tolerances must be set for a 
synchronous system. 

5 »3 Design of Asynchronous Circuits 

Very powerful methods are known for carrying out the logical design of 
synchronous circuits. These methods reduce a large part of the design to no 
more than an exercise in algebra, and they permit the design of significantly 
simpler computers than the more empirical methods which also require more time 
and effort on the part of the designer. 

Until recently, no similarly elegant methods were known for designing 
asynchronous computers because the theory of asynchronous circuits was incom- 
pletely developed. . Hence, some computer designers were tempted to resort to 
inferior engineering techniques so that the logical design could be carried ©ut 
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in a straightforward manner. This situation no longer exists and systematic 
methods have now been invented in connection with the theory of asynchronous 
circuits which permit the application of logical design techniques to asynchronous 
design which were heretofore used only in the design of synchronous circuits. 
These methods will be illustrated in the latter part of this chapter. It is true 
that the processes are somewhat more complicated, but this seems a small price to 
pay for the engineering advantages which result from using asynchronous circuits. 
In any case, the formal logical design of the computer circuits represents a very 
small part of the work which must be done in the over-all design of the computer, 
and may be discounted now that known methods exist for carrying it out. 



an asynchronous computer must occur in a definite time sequence if the behavior 
of the computer was to be independent of the relative speeds of the elements. 
This requirement could only be satisfied by a completely serial machine, and the ^ 
resulting slowing of the machine would be prohibitive. With the development of 
the theory of asynchronous circuits it was shown that parallel actions could 
indeed occur in asynchronous computers without giving rise to undesirable "race" 
conditions among the signals. Briefly, this may be accomplished by having 
logical elements within the circuit whose logical properties are such that they 
respond only when all of several incoming signals appear, thus yielding a com- 
pletion signal indicating that all of several parallel operations have taken. place 
Not only is there no restriction upon the amount of paralleling which may be done 
in this way, but one is permitted to use much more complex paralleling schemes 
than are possible with synchronous circuits. For example, if five operations, a, 
b, c, d, and e, are required of a circuit, and if the completion of a and b are 
required for the initiation d while only a is required for the initiation of c, 
and c and d are required for the initiation of e, we may represent the require- 
ments by means of an ordering diagram of the type shown below 



It was thought at one time that the signal changes taking place within 



a 



b 




<? 



c 



d 



e 
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Using recently-developed synthesis techniques for asynchronous circuits one can 
design a circuit which .corresponds to this diagram, and which would permit c to 
begin even if the completion of b were delayed. A synchronous circuit could not 
easily be made to yield similar flexibility, for while a and b could be carried 
out simultaneously, the initiation of c would not occur until the next clock 
pulse following the completion of both a and b„ 

5.4 Location of Malfunctions and their Repair 

Many techniques are available for locating malfunctions in synchronous 
and asynchronous machines. The particular techniques employed usually depend 
upon the nature of the malfunction, and without knowing what sorts of malfunctions 
O will appear most frequently in the system under consideration, one can hardly 
assess the problem. Experience with the Illiac has shown that a malfunction of 
the control often results in the stopping of the computer. When this occurs it 
is possible to trace the trouble by merely measuring voltage levels at critical 
points. It is hoped that if the asynchronous principles are adhered to even 
more strongly in the new machine, it will be correspondingly easier to service. 

A further advantage in the location of malfunctions in an asynchronous 
computer results from the fact that information is held in flipflops. As a 
result, it is possible to slow down operations so that they may be observed 
individually and the individual gates may be checked. Such checks are impossible 
in those synchronous computers where the information is circulating at high speed 
through the logical elements and is lost if the clock frequency is reduced. 

5.5 Reliability 

It seems likely that as a transistor develops faulty behavior it will 
exhibit this by going into saturation during otherwise normal operation. This 
would have the effect of slowing down the action of the logical element involved. 
In a synchronous circuit this type of malfunction would be likely to cause an 
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error in computation, but in an asynchronous circuit the effect wduld.be merely 
to slow down the operation. of the computer, .provided it is , built using 
the principles of speed independence which will be explained later. Such .. 
slowing of the computer could be detected by using checking routines and the 
fault could be located before it caused an incorrect calculation. 

Some of the newer and more elaborate synchronous computers have been 
designed to operate correctly regardless of how slowly or rapidly the clock may 
run. Certainly, such computers possess most, if not all, of the advantages of 
an asynchronous machine. By the same token, however, they are correspondingly 
more complicated and tend to require as many additional logical circuits as an 
asynchronous computer. Hence, by eliminating the clock and resorting to com- 
pletely asynchronous operation, one may actually achieve the same results, 
while. obtaining a faster machine with fewer elements. 

5.6 Asynchronous Principles 

We now turn to a brief discussion of the principles of asynchronous 
circuit design, in which intuitive notions will be substituted for the more 
lengthy rigorous treatment appearing elsewhere. ^ In this theory one must 
make two important approximations. First, all signals are taken as assuming 
only discrete values. In the binary case this means that only the two signal 
values and 1 are allowed. In physical circuits the signals do in fact, 
assume any of a continuum of values. Therefore one may make the approximation 
by splitting the continuum into two bands corresponding to and 1, separated 
by a region of indeterminacy. The approximation now consists of saying that 
the logical behavior of a given element is independent of where the incoming 
signals lie in the bands . 



1. D, E. Muller and W. S. Bartky: "A Theory of Asynchronous Circuits", 
Digital Computer Laboratory Reports Nos. 75 and 78, to be published in 
- the Proceedings of a Symposium on the Theory of Switching, Harvard 
. University Press. 
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Second, we must say that no variations occur in the position of the 
region of indeterminacy which are large enough to cause two elements to interpret 
the same incoming signal differently, that is, so that it appears in the band 
for one element and in the 1 band for another element. All signals must also 
be assumed to be of sufficient amplitude to pass through the region of indeter- 
minacy, as they change, and to enter the opposite band from which they came after 
reaching the region of indeterminacy. 

Although voltage tolerances are important, no assumptions are made 
concerning speeds. Therefore, we must inspect all possible sequences of states 
which may 'occur- A circuit will be said to be speed-independent if the final 
condition of the circuit does not dep'end on the relative speeds of the logical 
elements which make it up. This assumes that the circuit was started in a given 
initial condition, and then allowed to run until it either stopped or entered 
some sort of never-ejiding cycle. In terms of computers this means that a speed- 
independent computer will always produce the same results and stop in the same 
way when fed a certain problem, regardless of how the speeds of its elements 
may vary. Speed-independent circuits have properties which are of particular 
importance in the design of reliable asynchronous circuits. For this reason we 
shall shift our attention from the more general problem of designing asynchronous 
circuits to the design of speed-independent circuits. The concept of speed- 
independence may be easily confused with checking. It is possible to have a 
speed- independent circuit which is not checked and a checked circuit which is 
not speed- independent. For example, one may check the operation of a given 
circuit by duplicating it, element for element, and comparing the resulting 
signals from the two circuits with a third circuit. If the signals disagree the 
third circuit will cause the computer to stop, indicating an "error" in operation. 
This "error" may be due to the unusually slow or fast action of a circuit element, 
or it may be due to the appearance of a spurious signal. If the former situation 
exists we see that the circuit is not acting in a speed- independent way by our 
previous definition but the "error" will nevertheless be detected so that the 
circuit is checked. On the other hand, if the original circuit has been made 
speed-independent it would be possible for a spurious signal from some element 
to give rise to an incorrect calculation but it could never result from a varia- 
tion in the speeds of the elements. In other words, unusually fast or slow 
action of any of the elements is simply not regarded as an error-producing 
malfunction. 
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Three techniques are used in the design of speed-independent circuits. 
These techniques are generally not. used independently but together, to give the 
most effective results. They permit the design of a class of circuits which, 
while speed-independent, is slightly more restricted in character than the 
class of speedr-independent circuits. The circuits in this more restricted class 
which we. shall call semi-modular, have the property that no element in such a 
circuit which is excited (the signal which it produces is tending to change) is 
ever allowed to pass to equilibrium unless the signal produced by the element 
actually does change. It can be shown that if this condition is adhered to for 
all elements and all possible states of the circuit, then the circuit must be 
speed- independent. The restriction that circuits be semi -modular is not severe 
since it still permits the paralleling of operations taking place within the 
circuit. All of the essential circuits which are required in the control and 
arithmetic units of a computer have been designed by using the three techniques 
to be described here. Any parallelism which is possible in the operations may 
be retained in the semi-modular design. 

An example of a violation of s end -modularity is shown in the familiar 
flipflop circuit of Figure 5.1. This flipflop, consisting of two AND NOT 
elements, is in an unstable condition in which all signals appear oh all lines 
have. the value 1. 




Figure 5»1 
Flipflop 
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The flipflop could have reached this condition if both incoming signals were 
initially and if both were then changed to 1 simultaneously. The outputs'." 
will either go to the 1, or 0, 1 configuration depending upon which of the 
AND. NOT elements, acts first. When such a change occurs, the AND NOT element 
which failed to act will be brought into equilibrium and the condition of 
semi-modularity will be violated. This circuit is also not speed-independent, 
since either of two final conditions is possible depending upon the relative 
speeds of the elements. 

5.7 Method of Combining Blocks 

In the first of the three design techniques one makes use of previously- 
designed circuits and connects them together to form more complex circuits. By 
following certain rules it is possible to retain the property of semi-modularity 
during the interconnection process. 

This method will be illustrated by connecting two circuits, a shifting 
register and a counter, to form a circuit which will carry out any given number 
of shifts. Let us assume that the counter contains an element "a" whose signal 
changes from to 1 to 0, n times before the circuit stops. The count n may b e 
made to depend on the initial setting of the counter. The shifting register 
has an element "b" which changes from to 1 to every time a shift takes 
place. However, the shifting register is so constructed that the shifting will, 
continue indefinitely provided the element, "b" is able to continue acting. Both 
elements "a" and "b" may be required to feed their signals to the inputs of 
other elements in their respective circuits. A possible method for interconnect- 
ing the two circuits is shown in Figure 5.2. In this interconnection one of the 
elements, say "a", is made to feed a NOT element which in turn feeds all of 
the elements previously fed by "b". Element "b", on the other hand, is made 
to feed all of the elements previously fed by "a". 




ORGINIAL CIRCUITS CONNECTED CIRCUITS 



Figure 5°2 
Method for Connecting Two Circuits 

With this interconnection the counter and the shifting register will act in 
turn until finally when the counter stops the action of the shifting register 
will also be arrested. That such alternate action does, in fact, take place 
may be verified by tracing the changes in the signals*. Let us assume that 
initially the elements "a" and NOT have output 1 and that "b" has output 0. 
Thus, the NOT element is excited and tending to produce the output '0, while 
both the counter and shifting register are quiescent « When the NOT element 
acts, the shifting register will begin the process of shifting and upon com- 
pletion of the first part of the operation, will cause "b" to go to 1. This 
initiates a count in the counter which responds by driving "a" to 0. The 
process may now be repeated, interchanging "0"s and "l"s, returning us to the 
initial configuration. The time required for the combination to come to rest 
will be equal to the time taken by the counter previously, plus the time for 
n shifts, plus 2n operation times of the NOT element. . If the NOT element is 
v omitted when the circuits are interconnected it is impossible to achieve the 
alternate actions of the two circuits. In such a case either the signals from 
"a" and H b" 'are different and both circuits are quiescent indefinitely, or 
both signals are the same with the result that both circuits are tending to 
change in such a way as to lead to behavior which is not speed-independent <, 
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A saving in time can be achieved by having the counter and the shifting 
register operate in parallel so that the' times will not be strictly additive. 
This requires a different type of interconnection,, and also that we postulate a 
new logical element.' This element, labelled "c" in Figure 5-3 can actually be 
constructed from conventional AND, OR, and NOT elements, but we shall imagine 
it as a single element having two inputs and one output. Elements "a" and "b" 
are connected to the inputs of "c", and "c" is used to feed all the elements 
which were previously fed by "a" and "b" . 




Figure 5.3 
Alternative Method of Connection 

The logical properties of the "c" element are such that it will duplicate the 
signals at its inputs provided they agree, but if they disagree it retains its 
previous signal. In the latter case it has what may be called memory. The 
Boolean expression abvacvbc describes the "c" element's behavior. With 
the interconnection of Figure 5-3, the counter and shifting register execute 
their operations in parallel. If the counter acts more rapidly than the 
shifting register, then the initiation of the next count will be delayed until 
the signal from "b" appears and the signal from "c" is therefore caused to 
change. If the shifting register acts more rapidly, then the next shift will 
be delayed until the signal from "a" appears. The time taken by this combi- 
nation is the greater of the times taken by the counter and shifting register 
plus the time taken for 2n operations of the "c" element. 
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Even more complicated interconnection schemes than these can be 
devised involving two or more simple circuits and possessing more elaborate 
parallel-sequential relations between their operations, but the principles are 
well illustrated by the examples given here. All such techniques preserve the 
property of semi -modularity and hence are suitable for the design of circuits. 

5.8 Method of Simulating Synchronous Circuits 

The second technique permits one to design the basic simple circuits 
which carry out the fundamental operations required by a computer. These 
operations are first described in much the same way that one would describe 
them if one wished to design a synchronous computer, that is, by writing a set 
of Boolean equations to represent the operation in question. 

Although this technique is most effective in the design of circuits 
such as those in the arithmetic unit which handle information in a parallel 
fashion we take as an illustrative example the two-stage binary counter circuit 
which cycles through the never ending sequence 

(00), (01), (10), (11), (00), (01), etc. 

Simpler speed-independent counting circuits may be designed by using other, 
techniques. If we designate the first and....second signals of the pair as Z^ and 
respectively, we may write 

z- = z 1@ z 2 

.(5.1) 

Z 2 = 2 2 

to represent Z^ and Z^, the pair of signals which immediately follows Z^ and 
Z 2 . Here the symbol " © " indicates EXCLUSIVE OR and Z 2 indicates NOT Z 2 . 
In a synchronous system, we may obtain the circuit directly from equations 
(5«l) by use of two elements each having unit time delay as shown in Figure 
5.4- 
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Figure 5-4. Synchronous Binary Counter 



This circuit may be translated into a corresponding semi -modular circuit by 
means of the following systematic process. The result of this translation 
is shown in Figure 5.5. 

First, introduce two flipflops for each signal present in the 
synchronous circuit. One flipflop of each pair will be assumed to have the 
properties of the flipflop of Figure 4.5. The other will be represented as 
its dual although by use of suitably placed NOT elements one. may avoid using 
a dual flipflop. A dual flipflop is one in which the basic logical description 
of the flipflop is replaced by a dual description. Thus the dual of the flip- 
flop shown in Figure 5.1 would be represented by two OR NOT elements. These 
flipflops serve the purpose of holding the information in the signals of the 
circuit of Figure 5«4« Two are required for each signal since, during the 
process of gating into one flipflop, the information is stored in the opposite 
flipflop. The flipflops corresponding to are shown on the left in Figure 
5.5, while those corresponding to lie on the right. 

Second, replace the logical elements of Figure 5«4 by a double wire 
system. Connect this system between the first and second flipflops of each 
pair and introduce direct connections from the second to the first. In the 
double wire system two lines are. used to carry each signal so that during 



transmission of information one line will always be and the other 1. The 
presence of or 1 on the first line determines a bit of information. In 
order to distinguish one such bit from the next, we intersperse transmission 
with clearing of the lines which occurs when both lines are given the same 
signal. It is this requirement of clearing which forces the use of a double 
wire system since three possible conditions may exist, 0, 1 or clearing. The 
double logic is shown in Figure 5°5 between the upper and lower flip flops. 
The two ANDS on the left and the ORS below then replace the EXCLUSIVE OR of 
Figure 5.4, while the NOT of Figure 5.4 is replaced by the crossed wires 
centered between the two flipflops in which the double wire signals of are 
interchanged. 

Third, insert a completion circuit for sensing the completion of one 
gating operation and for causing the nexto This circuit requires the use of a 
"c" element of the type used in the circuit of Figure 5°3» This element is fed 
from completion signals from the four flipflops. Such completion signals are 
formed from the. three elements to the right of each flipflop in Figure 5«5« 
They indicate whether or not the state of the flipflop agrees with the sense of 
the incoming signal. 

During the operation of this circuit the center or logical section is 
alternately cleared so that all signals assume the value 1, and used for genera- 
ting new quantities Z^ and Z^. In this example, the signals at the inputs to 
the upper flipflops are adequate to determine when the clearing operation has 
taken place, but when more complex logic is used it may be necessary to generate 
signals from additional AND elements within the logic which are also fed to the 
11 c" element. In File Number 226 this technique is described and logical 
circuits are given for the various parts of an arithmetic unit,, 



2. James H. Shelly: "Design of Speed-Independent Circuits", Digital Computer 
Laboratory File No. 226, 7/19/57 
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Figure 5«5 

Asynchronous Binary Counter Simulating 
a Synchronous Binary Counter 



5o9 Method of Design from Change Charts 

The third technique which is used for speed-independent circuit design 
involves the use of what may be called change charts to design distributive 
circuits o Distributive circuits are slightly more restricted than semi-modular 
circuits but they still admit the possibility of parallel action,, A change 
chart is simply a list of the signal changes which take place at the nodes of 
the circuit together with a statement of the chronological ordering of these 
changes o Since parallel changes are admitted, this ordering will be a partial 
orderingo Each change is written as a pair of positive integers (f<, i). The 
integer i is the node number and the integers is the number of the change at 
that node» 

The set 2 of changes is required to have the following properties? 

1. The elements i) of £ are partially ordered and satisfy the de- 
scending chain condition (i.e., that all descending chains are finite), 

2o There exists an integer n such that i < n for all ^, i) in , 

3„ If (p(, i) is in land©< > 1, then {°\ - 1, i) is also inland 
- 1, i) < K Do 

The use of the set S to represent the behavior of a circuit is advantageous 
for three reasons,, First, it represents a more natural way of describing the 
behavior of the circuit than does the conventional state diagram, since it j 
deals with changes at individual nodes rather than states. Second, the change 
chart will contain many fewer elements than the state diagram when parallel 
changes occur „ In such cases the number of changes is approximately equal to 
the logarithm of the number of states. Third, the circuit derived from the 
change chart design technique will be distributive and hence speed-independent „ 

The change chart may be used to. form a distributive lattice ~A- of 
n-dimensional vectors - whose components are non-negative integers, by the 
following rule. 
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A given vector a = (a^, a^, . • • , a^) is in-X if and only, if 

1. for each a^ f there is a change (cA, i) in £ with ©< = a„ , and 

2, if U, i) 2 ^> j) then a £ 

It may be shown that is a distributive lattice with a zero where one 
defines the lattice ordering relation a ^ b to mean su < for all i = 1, n. 
In this case the lattice operations of greatest lower bound and least upper bound 
correspond to numerical componentwise minima and maxima respectively. 

The lattice A is closely related to the state diagram corresponding to 

the change chart £ . Each vector a in_/L may be taken as corresponding to a state 

of the circuit. This correspondence is a many-to-one correspondence since more 

than one vector a may correspond to a given state. The correspondence is set up 

as follows: Let u = (u, , u„, .... u ) be the initial state of the'' circuit. Here 

1' 2' ' n 

U l' U 2 5 *° ,, U n are ^ e ^ nary s ig na l s on the nodes 1, 2, . . . , n. Then the state 

v =* (v., v„, .... v ) is formed by letting v. = residue (a. + u. ) modulo 2. Thus 
1' 2' ' n -i —i l 

the component a^ of a may be regarded as the number of changes which have occurred 
at node i when going from state u to state v. 

One may determine whether or not node i is excited or in equilibrium 
in any state v corresponding to a given vector a. The rule which permits this 
determination may be stated as follows: 

Node i is excited in state v corresponding to C-state a if and 
only if (a^ + 1, i) is in £ and a^. 2 f° r a H (y$> j) in 5. for which 
j) <- ( a i + ' i) and J 1 i- 

Using .this rule we may determine whether or not any node i .is 
excited for each vector a. If it should happen that two such vectors, which 
give rise to the same state v disagree as to whether or not one or more nodes 
are excited, then we say that the change chart £. is not realizable, while if 
no such disagreement occurs it is realizable.. If the change chart is 
realizable .we may. write a set of Boolean functions which yield the equilibrium 
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and excitation conditions given. If these functions are used as elements in a 
circuit then it will have -A- for its state diagram and will follow the behavior 
described by £ when placed in state u. This represents a completely systematic, 
synthesis procedure, provided the original change chart is realizable. If it is 
not realizable we may always make it - realizable by introducing additional nodes 
and corresponding changes on these nodes without altering the original partial 
ordering. Systematic methods have been devised for introducing such additional 
nodes. 

One of the weaknesses of this method lies in the fact that the Boolean 
functions obtained in this way may not correspond to the relatively simple 
elements which have been discussed in Chapter k° By systematic introduction of 
additional nodes, even if they are not required for realizability, one may also 
remove this objection. 

The three methods summarized have proved workable in the design of 
logical circuits. for the arithmetic unit and control. Whether or not the arith- 
metic unit should be designed in this fashion will be decided on the basis of 
speed and reliability as described in Chapter 3« It is felt that gains in speed 
and reliability will result if the control is made speed- independent. The third 
design technique can be used most effectively in this area to simplify the design 
and to allow a greater degree of parallelism than would otherwise be possible. 

We expect the Illiac to be used effectively in carrying out the- 

systematic part of the third design procedure. A use which has already been 

made of the Illiac is in the testing of circuit designs. Programs have been 

written for simulating the behavior of circuits with the Illiac and testing them 

(3) 

for semi -modularity and for correct sequencing. Without these programs the 
design of semi-modular circuits would be virtually impossible since the checking 
process is usually too tedious to be carried out by hand. 

3. W. Scott Bartky: "Complete Circuit Analyzer", Library Routine 03, Digital 
Computer Laboratory, 6/14/56. 

W. Scott Bartky: "Single Circuit Analyzer", Library Routine Q4, Digital 
Computer Laboratory, 11/20/56. 

James Shelly: "Complete Circuit Analyzer", Library Routine Q5, Digital 
Computer Laboratory, 7/22/57 • 
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CHAPTER 6 
MEMORY 



6.1 Introduction 

The problems of organization discussed in Chapter 3 would have 
been greatly simplified if it had been found possible to increase the speed 
©f memory devices by as great a factor as the speed of arithmetic circuits. 
A survey of the various forms of memory which have been used or proposed shows 
that very high speeds can indeed be obtained, for example, by using registers 
similar to those of the arithmetic unit, for storage purposes. The cost of 
such very fast memories is excessive, except possibly as small auxiliary parts 
©f a memory hierarchy. For the main random-access memory something much less 
costly is required. Our efforts here have been largely devoted to a study of 
ferrite-core memories. Some preliminary study of electrostatic memories and 
diode-capacitor memories was also made.^^^ These tests did not determine 
definitely the limits of speed of either of these devices even when considered 
as small auxiliary memories. However, it became apparent that work on these 
lines was less likely to be rewarding than work on magnetic memories, so 
further effort was concentrated on the latter. 

The speed of conventional ferrite-core memories, which use a three- 
dimensional coincident-current method of selection, is limited by the resulting 
2:1 current selection ratio. For available cores this fixes the minimum cycle 
time for such memories at about 4 M-s . , Other arrangements ©f ferrite cores 
have been studied, ©ne ©f which, the word-arrangement memory described in 
section 6.3, may be several times faster. 



1. G. H. Leichner; "Proposed H^gh-Speed Williams Memory Tests", Bigital 
Computer/Laboratory File No. 219, May 28, 1957. 

2. K. C. Smith: "Diode-Capacitor Memory" , Digital Computer Laboratory File 
No. 218, May 10, 1957 o 
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Memories using thin . magnetic films give promise of high speeds but 
are not discussed in this report because much further research is needed before 
their potentialities can be properly assessed. Memories using ferrite cores 
with more than one hole are also omitted in this study because of lack of data 
concerning them. 

6.2 Comparison of Some Types of Memory 

In this section an attempt is made to show how increased speed is 
related to increased complexity for several types of memory. In order to form 
a useful basis for comparison of diverse forms of memory some over-simplifi- 
cation has been necessary. Since both reliability and cost of maintenance may 
be expected to be related to the number of tubes, transistors and diodes used, 
the comparison has been based on a count of these active elements. Initial 
cost would, of course, be influenced by other factors also. 

To reduce the count of active elements to a single measure, a Weighted 
total has been given which is equal to T + 2S + 5L + ,5D where T, S, L and D 
refer to the numbers of transistors, small tubes, large tubes and diodes respec- 
tively. The weighting factor 5 for large tubes is based on expected lifetime as 
compared with transistors, and the factor 2 is based on the assumption that most 
of the small tubes (pentodes or double triodes) could be replaced by transistors, 
but that on the average twice as many transistors would be required. 

It would be desirable to replace the high-current tubes also with 
transistors. If, as now seems likely, this is possible, some change in the 
weighted totals would result but the relative position of the various memory 
designs would not be significantly altered. 

The comparison of speeds, in the last column of Table 6.2, is also 
somewhat qualitative because, with the exception of conventional core memories, 
and registers, none of the memories compared in the table have been built with 
the speeds shown. The data given in the table were mainly calculated by extra- 
polation from experimental results obtained using models of parts of memories. 
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Considerable variety is possible in the design of any type of memory 
so that certain assumptions are necessary. For core memories we have assumed 
that 

1. current drivers require 1 large and 2 small tubes, 

2. signal amplifiers and related logical circuits require 9 small 
tubes per digit, and 

3 . decoding networks followed by level restorers to provide sufficient 
• signal to actuate current drivers require 40 transistors for 8 

drivers or 288 transistors for 64 drivers. 

There may be some question whether better values may be obtained for 
the expenditure of a given number of tubes by making several separate memories, 
or by making a memory in which blocks of several words are read at once in 
place of a single memory with one-word access. The formulas below are made 
sufficiently general to allow such comparisons to be made. In addition, results 
for certain arrangements are given in tabular form. 

Let A be the number of separate memories, B the number of n-bit words 
to be read simultaneously as a block, and C the number of blocks in each memory. 

The total number of words is then ABC. A conventional memory requires 40^ 

(3) 1 

current drivers while a word-arrangement memory requires 2C 2 current drivers. 
A conventional memory requires nB drivers for digit -inhibit windings, and a 
word-arrangement memory requires 2nB such drivers. 

On the basis of the above assumptions the following active components 
are required: 



3. See section 6„3 • 
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Table 6.1 

Total Number of Active Elements for ABC n-bit Words 



j Element s 


Conventional Memory 


Word-Arrangement Memory 


Large Tubes 


A(nB + 4C^) 


i 

A(2nB + 2C 2 ) 


Small Tubes 


A(llnB + 8C^) 


... . .... ... r . 

A(l3nB + 4C 2 ) 


Transistors 


■ ' l 1 — 

20AC V 


10AC 2 


Weighted Total 


A(27nB + 56C 5 ) 


A(36nB + 28C 2 ) 



Table 6.2 
Comparison of Types of Memory 



Type of Memory 


Active Elements per Word (52-bit) 


Operation ' 
Time 

k-s 


Large 
Tubes 


Small 
Tubes 


Trans- 
istors 


Diodes 


weighted 
Total 


1. Conventional 
Core Memory 
8192 words 


.011 


.080 


.023 




.24 




2. Same, 4-word 
blocks 


.029 


, .29 


.016 




.74 


1.5< 2) 


3 . Word-Arrangement 
Core Memory 
8192 words 


. . 035 


.12 


.11 




.54 


1.5 


4. Same, 2-word 
blocks 


..040 


.20 


. .08 




.69 


.75 (2) 


5. Same, 4-word 
blocks 


.063 


.36 


.06 




1.1 


.37 (2) 


6. 8-word Delay Lines 






50 


50 


75 


■J3) 


7. Registers 






260 


1200 


860 


.05 



(1) Based on existing memories. Up to 50$ greater speed may be 
possible. 

(2) Time per word (assumes all words in block used). 

(3) Average access time. 
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In Figure 6.1, the number of active elements required is plotted 
against total number of words, ABC. The different curves are labelled by the 
values A,B followed by c for conventional memories and w for word-arrangement 
memories. 

Three systems might be considered roughly comparable since their 
(not necessarily random) access-time per word is 1.5 us: These are the word- 
arrangement memory reading one word at a time (l, lw) , the conventional memory 
with four-word blocks (1, 4c) and four separate conventional memories (4, lc). 
Of these the word-arrangement memory requires fewest active elements , although 
this advantage is lost for very large memories. 

6.3 Word-Arrangement Memory 

The word-arrangement memory first proposed by the National Bureau of 
Standards^ may be made considerably faster than the conventional core memory. 
There is no limitation on current selection ratio for the read pulse, and a 3:1 
selection ratio is possible for the write pulse, while conventional memories are 
limited to 2:1 for both. An access time of 1.5 M-s should be possible with the 
word-arrangement memory. 

Depending on the choice of control system, the memory may be constructed 
to read out 4-word blocks or to read out separate words. The latter arrangement 
is illustrated in Figure 6.2. The method of operation can be seen from this 
diagram. 

The memory core matrix consists of 8192 columns and 2x52 rows of cores, 
divided in some convenient way into separate frames. Each column makes up one 
block of words, and a pair of adjacent cores represent each bit. One wire, 
labelled W, runs through all cores of one block of words. Three other wires 
perpendicular to this one run through each core. Two of these carry pulses from 
bit-selections drivers and the third is the sensing wire. 



4. National Bureau of Standards: "Progress on Computer Components", October 
1954 - March 1955- 
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2,lw 




total words ABC x 10 



NOTE: Curves are marked with values of A, B followed by c for conventional 
and w for word-arrangement. 

Figure 6*1 
Comparison of Word-Arrangement 
and Conventional Core Memories 
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Figure 6.2 
Proposed Core Memory 
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The reason for using two cores to represent each bit is primarily to 
provide a constant load for the switch cores through the V/ wires.' In the 
arrangement described there will always be 52 cores changing state on each pulse 
no matter what combination of 0' s and 1" a is stored. Some additional advantages 
of the two-core representation are mentioned later in this chapter. 

The sensing process is destructive and must be followed by a "rewrite" 
process.' In storing information, the "write" process must be preceded by a 
"clear" process which is the same as the sensing process. Thus both sensing 
and storing require essentially the same sequence of operations. This sequence 
is described in the next paragraph. 

A signal indicating that the memory is to be used sets flipflop 1 
allowing the address to be gated through the decoder to the X and Y drivers. 
The Y drivers are normally on, permitting the flow of current, biasing the switch 
cores in a given state. The current in the selected Y winding is reduced to 
zero, and the selected X driver is turned on resulting in the reversal of the 
selected switch core. This produces a pulse of 1,2 amperes in the wire marked 
W which passes through all cores of a given word. A pair of cores representing 
one bit, for example the cores in the dotted rectangle, are normally in opposite 
magnetic states. The W pulse brings whichever core was in the minus state to 
the plus state. The direction of the resulting induced pulse in the sensing 
winding depends on which core was switched, that is, on whether a "1" or a "0" 
was stored. The sensing amplifier produces a pulse on either the "1" or "0" 
output and sets the corresponding bit of the register in the appropriate sense. 
The output of the amplifier also sets flipflop 2 and resets flipflop 1, turning 
on either driver or Bq, and returning the X and Y amplifiers to their normal 
states, thus producing a reverse pulse of 0.8 amperes in wire W. Assuming that 
driver Bq was the one turned on, the 0.4 ampere current produced by this driver 
opposes the 0.8 ampere current in the W wire in the top core,, leaving a net 
current of 0.4 amperes which does not change the state of this core. In the 
lower core of this pair, however, the currents add, giving 1.2 amperes, and 
returning this core to the minus state. The 0„4 ampere current in the Bq 
winding passing through cores of unselected words is insufficient to alter 
their states. 
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The only difference in the operation cycle when information is stored 
instead of read' is that the register is set by information from the arithmetic 
unit instead of from the sensing amplifier. 

The X and Y. lines through the switch cores and the B Q and B^ lines 
form transitu. ssion lines or low-pass filters, and must be terminated to avoid 
reflections. The W lines, on the other hand, pass through relatively few 
cores and half of these cores are reversed during each pulse so that these lines 
form a resistive load which varies in magnitude during the cycle but which does 
not depend on the number stored because of the use of compensating cores. There- 
fore no external resistance is needed to terminate these lines, greatly reducing 
the power which must be supplied by the switch cores. 

In most memory applications, cores are completely switched from one 
maximum remanent state to the other. This is not necessary when compensating 
cores are used since reversible flux changes in the two cores cancel out and a 
good signal-to-noise ratio may be obtained even when the cores are only partly 
switched. 




CORE A . CORE B 

Figure 6«3 
Hysteresis Loops for a Pair 
of Cores Representing One Bit 
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The two hysteresis loops in Figure 6.3 represent the two cores of a given bit. 
Before the first W pulse starts suppose core a is in state and core b in 
state B^. The "read" pulse brings- core b into state B Q also. During the 
"write" pulse either core a or b will be brought to state A^ or depending 
on the digit to be written, the amount of flux change being limited by the 
available flux change of the switch core. Partial switching has several 
advantages. It reduces the power dissipated in the memory cores making the 
problem of heating less serious, and it reduces the size of the switch cores and 
the power required to drive them. Unfortunately,- it does not reduce the value 
of the current required. ' A further advantage of partial switching lies in the 
possibility of non-destructive sensing which is discussed in the appendix to 
this chapter. 

6.4 Experimental Results 

Experimental tests on a one-word model^^ were made to verify the 
predicted 1.5 M-s . access-time, and to study problems related to current regulation, 
choice of switch cores, etc. The model memory consists of 100 cores (General 
Ceramics Type SI, Size F-394) strung on drive wires so that each pair of cores 
represents one bit of a 50-bit word. A block diagram is given in Figure 6.4. 
As only two bit-selecting amplifiers were constructed, these 50 bits cannot be 
selected independently. Only two combinations, 1010 ....10 and 0101 ....01, can 
be selected. Several operations are possible, reading and writing either word 
repeatedly, reading one word and writing the other alternately, or several simple 
sequences controlled by a counter. These operations should be sufficient to 
determine whether or not the first few cycles following a change in word stored 
are appreciably different from steady-state cycles. 

Certain limits are placed on the values of current pulses by the fact 
that the model is intended to represent part of a larger memory. 

1. Bit-selection pulses must not exceed the value of l/2 for S^ cores 
(0.42 amp) so that cores of unselected words would not be affected. 

5. R. W. McKay, N. N. Yu and C. Pottle: "A One-Word Model of a Word-Arrangement 
Memory";, Digital Computer Laboratory Report No. 79, May 1957. 
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Figure 6.4. One-Word Core Memory 



2. The output pulse from the switch cores on the write part of the 
cycle should not exceed the value of the bit-selection pulse plus 
1/2 but this limit is not critical since the pulse is applied only 
to cores of the selected word, and a small change in the core which 
is supposed to be held fixed does not matter as long as the other 
core changes much more. 

3. If the X and Y drivers are to be simple on-off switches the X 
pulse must not exceed the Y pulse by more than i^ for the switch 
core. 

A switch core may be a very inefficient pulse transformer because the 
energy required to reverse the magnetization of the core itself may exceed the 
energy output.. It is readily shown that the energy lost in the switch core is 
least when the radius is least. However, the area of cross-section must be 
great enough to supply the required flux change. Therefore long' narrow cylinders 
would be ideal switch cores. A more practical arrangement having the same 
desirable features is provided by a switch made up of a number of 0.08 inch 
ferrite cores strung together. The heating effects should also be small in such 
a switch because of the large surface-to-volume ratio. 

Switches made up of small ferrite cores in this way have been used in 
most of the tests on the one-word model. A few tests were made using permalloy- 
ribbon cores. Ferrite cores seemed to be preferable to permalloy cores for 
switches with the type of load provided by the W- lines in this model, because 
damped oscillation or "ringing" was produced by the permalloy cores. This 
effect could be eliminated by means of external resistances but at the cost of 
considerable wasted power.. 

Partial switching of the memory cores, as mentioned at the end of 
section 6.3 can be produced by decreasing the number of ferrite cores used in the 
switch and thus limiting the total flux change. Since there is no direct method 
of measuring the state of magnetization, 0, of a- core at any instant, this 
quantity must be determined by integrating the output voltage curve which is 
proportional to The output waveform from a memory core with partial switching 

is shown in Figure 6„5 » The application of partial switching to produce a method 
of non-destructive sensing is described in the appendix. 
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Figure 6.5 
Output Waveforms for Partial Switching 



The complete read-rewrite cycle, in the model takes 0.8 us.. However, , 
in a full-scale memory, this basic cycle time may be increased due to variations 
in characteristics of tubes and other circuit components. The total random- 
access time of a large-scale core memory is dependent upon the capacity of the 
memory. One additional delay- in a large core memory is the transmission time 
for pulses traveling through many cores. Since magnetic switches are used in 
the memory, transmission delay of pulses through these switches would also have 
to be accounted for. Estimates are available for the transmission time of pulses 
through memory cores (20 mus/1000 cores). An approximate measure of the delay 
in switch cores of the type used in the model was found to be 3 W-s per switch. 

To verify the cycle time of approximately 1„5 us for a full-scale 
memory, various delays would have to be accounted for. The following table shows 
the delay times which might be expected for such a memory. 
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Table 6.3 
Estimated Time Delays in a Full- 
Scale Word-Arrangement Core Memory- 



Basic cycle 



0.85 l-ts 



Decoding 



0.05 us 



X or Y drivers 



0.15 us . 



Bit-selection drivers 



0.15 us 



Readout Amplifier 



0.05 us 



■Setting the Register 



0.03 lis 



Transmission 



T(n) us 



The value of T(n) depends how the sensing wires and digit selection wires are 
arranged. The maximum value for 8192 words is approximately 3 mus . for each 
of 128 switch cores plus 20 mus for each thousand cores which would give 
approximately .5 us.. However, this time may be reduced considerably by dividing 
the long transmission lines into sections and by various other means. A value 
about .25 us seems quite possible. 
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Appendix to Chapter 6 
Possibility of a Non-Destructive Readout 

A possibility of non-destructive sensing arises from the fact that one 
of the cores representing a given bit may be only partially magnetized (or even 
practically demagnetized) whereas the other is completely magnetized. The 
slopes of the reversible magnetization curves for cores in these tw© states 
differ considerably. If the slope of the flux-current curve for the nearly- 
saturated core is represented by L henries and that ©f the partially-magnetized 
core by L' henries, the output voltage from the pair of cores when a current 
pulse passes through is +^L' -L)^-, where the sign depends on whether a "0" or "1" 
is stored. Provided the pulse is less than Q»4 amperes no permanent change An 
the state of the cores would occur. Some preliminary tests indicate that |* is 
approximately li9 and that this ratio is almost independent ©f current so that 
comparatively small currents could be used. 

Preliminary experiments indicate that an extremely fast readout may be 
obtained by this system. In fact, the main delay may be that due t© transmission 
©f signals through the- array of cores. To obtain the fastest readout times it 
might be necessary to apply this method only to a small section of the memory. 

Problems related to signal-to-noise ratio exist which have not yet been 
thoroughly investigated. These problems d© not appear to be insuperable, and 
if they can be overcome, a small memory with reading time of .2 to .3 u-s may 
be possible. Of course, the "write" time into this memory would be the same as 
in the destructive readout memory. 



CHAPTER 7 

INPUT, OUTPUT, AND AUXILIARY STORAGE 



7 .1 Introduction 

It is the purpose of this chapter to present a number of rather general 
considerations relating to the requirements for input, output and auxiliary 
storage devices necessary for a computer having the arithmetic and internal 
storage characteristics described in other parts of this report. The results are 
incomplete in that they are insufficient to indicate the detailed control structure 
and instruction code necessary. 

The input-output equipment of a computer plays two somewhat distinct 
roles: (l) it . serves as a means of communication between the computer and other 
automata or humans, and (2) it serves as an auxiliary storage medium for a computer. 
The distinction between auxiliary storage equipment and input-output equipment can 
sometimes be made in terms of the length of time information is stored on the 
equipment. Thus magnetic drums are quite often used as auxiliary storage for 
short-term storage of partial results of a calculation which are to be used fairly 
soon in a subsequent computation. Although magnetic tapes are used for this 
purpose they may also be used for storage of results away from the computer and 
for long periods of time. The use of the input-output equipment as a short-term 
auxiliary storage device is the most demanding one with respect to speed. 

In this chapter we shall review the characteristics of input-output 
devices now commercially available and compare their data rates, the number of 
bits per second that can be read from or written on these devices, with the 
multiplication rate of arithmetic units and internal storage access rates. 
Comparison of data rates will also be made with the rates of generation and 
consumption of data by the human user. 

For many scientific calculations the time spent in computation may be 
expressed as 

T = FnNM 
c • - 
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(cf . Chapter l) where n is the number of multiplications per word stored in 
the internal memory, N is the number of words stored in the memory, M is the 
multiplication time and F a weighting factor which for a sequential machine, is 
usually between one and ten and for the computer under consideration would prob- 
ably be between one and five. 

If the input-output equipment is such that a single word is read or 
written in W seconds then the time spent in reading and writing N words into the 
internal memory is 

T TT = 2NW. 
W 

The adequacy of the input-output equipment as auxiliary storage would 
be measured by the ratio T^/T which should be less than and at the most of the 
order of one. If this upper limit is used we obtain for a balanced computer 
(and problem) 

\ = 2WN = 2W/M 

T nFMN nF 
c 

Thus for a "scientific" calculation a quantity which plays a role in estimating 
machine balance is W/M, the ratio of the read-write time per word to the multi- 
plication time. 

For problems dominated by memory access-time it is expected that T^ 
will be proportional to the product of the number of words in the internal 
storage and memory access-time. Thus the corresponding ratio is W/a^ where &q 
is the memory access-time or (W^O/a^/M ^= 5 2W/M <> 

The reciprocals of these ratios, -that is, the quantities M/W and 
aVW ^ 2 M/W are treated in the subsequent discussion and called input-output 
factors. We define 

s W 

and 

F =^xl0 3 
c W x x ' 
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The condition given above may be written as 

■ v 2000 

■ s nF 

It is extremely difficult to predict the manner in which the quantity nF varies 

over a range of problems.. For many scientific problems whose data requirements 

exceed the storage capacity of the main memory and drum, such as problems in 

linear algebra, the value of nF may lie between 2 and 20. For other problems 

it may range as high as 100. However, it is frequently possible to rearrange 

the order of calculation in problems with small values of nF in such a way that 

nF becomes 50 or 100 corresponding to values of F g of 20 or 40. The higher values 

of F g demand faster input-output equipment. If F g is 20 or 40 and nF is 100 or 50 

respectively, then the time spent doing arithmetic is about the same as the time spent 

in using magnetic tape as an auxiliary memory. Therefore, although it would be 

very desirable to have magnetic tape units of the sort presently under development 

for which' F is expected to be 200 or more, a very large class of problems may be 
s 

put into a form suitable for tape units presently available for which F is about 

s 

20. The need for faster equipment would be lessened if internal memory capacity 
and the control were such that the input-output equipment could be used as 
auxiliary storage at the same, time that the internal memory and arithmetic units 
were engaged in computation. 

7.2 Data Plate Characteristics of Scientific and Data Processing Computers 

The data rates for punched card, magnetic tape, and printing equipment 
are tabulated for a jiumber of computers in Table 7»1» Data rates are reduced to 
units of binary digits per microsecond, under the assumptions thats 

1. Binary punching is used for cards, 

2. One alphanumeric character is equivalent to six binary digits; one 
decimal digit is equivalent to four binary digits. 

3. Data rates are for one unit for any particular computer ; even if 
several units of the same type can be operated in parallel. 

-167- 



574 183 







SINGLE UNIT DATA RATES 31TS/>?S 




INPUT & OUTPUT 


FACTORS 


x 10 3 
















au>iJaJi.lrxO 


» CALCULATION 


COMPUTES 


STOR- 
AGE 


MULTI- 
PLY 


PUNCHES GMDS 


mas. TAPS 

IH OR OUT 




PUNCHED CARDS 


MAG. 
TAPE 


PRINT 








■ In 


Out , 


1 Unit 




In ■ 


Out 






DATAMATIC 1000 


4 


A At t> 


0.0144 


0=0016 


i 

A Ol 

U»<4 


•0.0108 


300 


33.3 


5000 


22} 


IBM 709 


.3 


O.23 


0.0036 


0.0014 


0.09 


0.006 


IS.? 


6.2 


391 


26 




: y 1. 


0.028 


0.00384 


O.OOI92 


0.084 


<J.tA/fO 


137 


68.7 


3000 


279 


IBi 704 


3 


0.15 


0.0036 


0.0014 


0.09 


0,0011 


24.0 


9.3 


600 


7.3 


.tJQHNNIAC 


2.67 


0.104 


0.00384 


0.0016 




0.0163 


; 36.9 


15.4 




157 


LARG 


12 


4.8 


0.0096 


0.0096 


©.12 


0.0156 


2,0 


2.0 


25 


3^25 


UNIVAC SCIENTIFIC 


4.5 


0ol51 


0.00192 


o. 00192 


O.O767 


0.0078 


12,4 


12.4 


507 


51.6 






0.057 


; (PUNCHES 
0.0015 


TAPE) 
0.0003 






26.3 


5.3 




0.44 


PROPOSED GC8JPUTER 


33 


13 



















Table 7.1 
Data Rates for Input-Outjwit Devices 



Data rates for storage and multiplication are found by dividing the number of 
binary digits per word by the access and multiplication times respectively. 

Also shown in the table are Illiac rates with punched paper tape 
rather than punched cards . Experience with Illiac indicates that it is suitable 
for "scientific" calculation, useable for intermediate problems such as statistical 
work, and unsatisfactory for data processing applications » Illiac output is 
normally on paper tape with printing done on an off-line basis, although on-line 
operation of a printer is possible <> 

We shall employ the. data of Table 7.1 in the following way; 

lo For an indication of characteristics of card, magnetic tape, and 
printing devices currently available, 

2„ As a means of extrapolating input-output requirements for the 
computer whose design is discussed in this report, 

3» As a means of determining internal storage access interruptions 
for input-output or auxiliary storage purposes q 

7»3 Magnetic Tape Characteristics 

The discussion in the introduction to this chapter indicated that an 

input-output factor F between 20 and 200 would enable the input-output equipment 

s 

to be used as auxiliary storage for a large and useful class of problems,, It is 
the purpose of this section to show that the requirements implied cannot be met 
by devices slower than magnetic tapes and to indicate to what extent the require- 
ments can be met by magnetic tapes. 

For the computer proposed, the data rate corresponding to a factor 

F = 20 is 20 x 13 x 10" 3 = 0„26 bits/us;-, for F = 200, the rate is 2 6 
s s 

bits/ugo . The lower rate is approximately that of the magnetic tape unit with 
31 parallel channels, 80 bits/inch, and 100 inches/second, which is used with 

the Datamatic 1000 „ The rate of 2„6 bits/^0. ; corresponding to a factor F = 200 

s 

cannot now be met by commercially available conventional tape units » However, 



it seems quite likely that within the next few years commercial development will 
lead to magnetic tape units sufficiently fast to satisfy the requirements corre- 
sponding, to factors F g of 200 or more. In order to maintain a continuous data 
rate of Q„26 bits/Us it is necessary to use more than one tape unit so that 
rewinding can proceed in parallel with reading or writing . The simultaneous 
reading or writing on a number of tape units leads to a proportionate increase 
in the factor F g and the data rate at a cost in complexity of control. 

Inspection of Table 7.1 indicates that it would be impractical to 
provide . punched card or punched paper tape equipment capable of a data rate of 
0.26 bits/|is, .. since simultaneous operation of 18 of the fastest card readers, 
100 paper tape readers, or 200 card punches would be required. The control prob- 
lems and programming difficulties inherent in simultaneous operation, as well as 
the prohibitive equipment costs, preclude the use of paper tape or cards in this 
manner. 

7.4 Magnetic Drum Storage 

It was noted in Chapter 1 that problems requiring 50,000 to 100,000 
words of data are to be expected. Further discussion in . Chapter 2 indicates 
that, with a relatively small increase in time, the bulk of the storage need not 
be supplied in the random-access core storage unit, but can be supplied in the 
form of a lower-speed no n- random-access storage device. The characteristics of 
auxiliary storage devices which fulfill the requirements outlined in Chapter 2 
are described here. 

A magnetic storage drum consists of a rotating cylinder coated with 
magnetic material. Other design features being equal, the storage capacity is 
a function of the surface area of the cylinder, and is increased in direct 
proportion to either the length or diameter of the cylinder. The speed of 
rotation is limited by the mechanical difficulties in maintaining close toler- 
ances between fixed reading and writing heads and the surface of the rotating 
cylinder. Roughly speaking, if the storage capacity is increased, the maximum 
allowable speed of rotation must be decreased. 
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A drum similar to that used with Illiac would provide a storage capacity 
of approximately 10,000 words, with the time for one revolution (<=K of Chapter 2) 
equal to 17 milliseconds « The packing density is such that 2500 binary digits 
are recorded on one track around the periphery by one recording head; the time 
associated with each digit is then ^"2500 = 6°8us. • By reading and recording on 
50 tracks in parallel, the minimum access time per word {/$ of Chapter 2) is 
6o8m.So . The length of the cylinder is such that 4 sets of 50 tracks can be used, 
so that the total storage capacity is 4 x 2500 = 10,000 words of 50 bits each. 

A drum storage capacity of 20,000 or 30,000 words can be achieved by 
use of 2 or 3 drum units of the type described,, The cost is essentially the same 
for each unit, but this alternative has obvious advantages if one of the mechanical 
units, should fail. The following characteristics are those of a commercially 
available unit with greater storage capacity: 

6000 Bits per track - 

450 Tracks 

<K' - 51 ms ." /? = 8.5 M-s 

Storage capacity = 54000 words. 



7»5 Input-Output Requirements for Data Generated or Consumed by Human Users 

The typing and reading rates of human users can be expressed in the 
units of bits/us : employed for Table 7.1° Assuming each typist or reader is 
occupied 8 hours per day, the rates for 100 persons ares For typing at the 
rate of 60 words per minute 

60 x 5 x 5 x 1Q0 _ A mioc . / 

< » ^ , 7 *= 0o OOI25 bits/us, . 

2 x 60 x 10° 

and for reading at the rate of 300 words per minute 

300 x 5 x 5 x 100 . O# oo625 WtB/W-.- 
2 x 60 x 10° 
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For both calculations, words of 5 letters are assumed, with 5 bits per letter 
and 50$ redundancy assumed for English text. 

Thus there are commercially-available punched card readers and fast 
printers which, if used 24 hours per day, would require on the order of 1000 
typists and 250 human readers respectively. The point to be stressed here is 
that presently available equipment is more than adequate for handling data 
humanly generated or consumed. 

7 06 Other Modes of Data Generation or Consumption 

It is to be expected that a wide variety of problems will be presented 
to the proposed computer, even though a relatively small variety of problems will 
occupy a majority of computing time. Special input-output facilities will be 
required for real time problems, and for handling data generated automatically, 
either in digital or analog form. For some problems digital punching or printing 
requirements would be reduced by a cathode ray tube output for analog or alpha- 
numeric presentation of data. It should also be recognized that many calculations 
will be performed on data stored on punched cards or paper tapes. 

In view of the requirements for magnetic tape units, it appears that 
with the exception of a direct data link for real time applications, the additional 
requirements discussed in the previous paragraph can be met by ©ff-line magnetic 
tape conversion equipment. 

7.7 Other Aspects of Machine Balance 

The characteristics of the drum storage unit impose requirements ©n the 
capacity of the core memory . For transfer of data to or from the drum, the time 
for transfer of a block of N words is, on the average, 
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where ^ and.</ are as defined in sections 2.1 and. 7*4. If the average initial 
access-time °{/2 is to be not more than half of the total block transfer time, 
then N must satisfy- 




Thus the block size for = 6.8 \xs. and <^ = 17 me is at least . 12$0 
words and for ^ = 8.5 us and ^= 51 ms), the block size is at least 3000 words. 
The capacity of the core memory must be sufficient for block transfers of the 
magnitudes indicated by drum characteristics as well as for storage of data and 
instructions otherwise required. The balance between core memory capacity and the 
characteristics of the drum unit represents an important consideration in the choice 
of the drum unit. 

The data of Table 7»1 can be used to determine the relative number of 
storage accesses required for input, output and auxiliary storage purposes. The 
proposed computer is to have a storage rate of 33 bits/ps, , and simultaneous use 
of a card reader, card punch, magnetic tape unit, and a fast printer would result 
in a combined data rate of 0.28 bits/ us o . The ratio 33/0.28 = 108 indicates that 
less than 1% of the accesses to the core memory are required for data transfers to 
or from magnetic tape and input-output units. 

7 .8 Summary and Conclusions 

The purpose of this chapter was to investigate the feasibility of meeting 
the input-output and auxiliary storage, requirements with commercially available 
equipment. The bulk of the storage requirements can be met by one or more com- 
mercially .available, magnetic drums; additional auxiliary storage requirements 
can barely be met by the fastest of magnetic tape units now available. 

For the purposes for whieh the proposed computer is intended, the input- 
output devices serve as a means of communication with human users and, less fre- 
quently, with other automatic devices., Presently available devices are more than 
adequate when human limitations are taken into account, and for the less frequent 
input-output uses, special sqoipment can be obtained* 
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CHAPTER 8 



ARITHMETIC UNIT 



8.1 Introduction 



The purpose of this chapter is, to describe a design for a binary- 



parallel arithmetic unit. The majority of the 
arithmetic units in binary digital computers 
now in existence are composed of an accumulator 
register A, a number register M, a quotient 
register Q, an adder, and one or more comple- 
menting circuits. For asynchronous operation, 
temporary accumulator and quotient registers 
(A and §) are provided. Figure 8.1 illustrates 
the interconnections and gates (G) for an 
arithmetic unit employing a complementary 
representation of negative numbers. 



A H — 



ADDER 



J 



ft H » 1 

jr ® 

Q~] * 



M 



COMf? 
CKT. 



■ ' ■ Figure 8.1 

Block Diagram of a 
Conventional Arithmetic Unit 

Independent of the structural details, addition is a basic operation 
of most of the units noi^ in existence, with multiplication performed as a sequence 
of conditional additions and shifts. •• The time allotted to addition is sufficiently 
long to allow for the worst case of a carry propagating from the least significant 
digit to the most significant digit of the adder. Multiplication.;' requires, for a 
multiplier with n non-sign digits, n shifts and, on the average, n/2 additions. 

Methods have been proposed for increasing the speed and efficiency of 
the arithmetic unit. These include: 

1. Use of circuitry to generate a "carry-completion" signal, i.e., 
to indicate the completion of the (longest) carry sequence in each 
addition. The average longest carry sequence is on the order of 



1. B. Gilchrist, J. H. Pomerene, and. S. Y. Wong: "Fast Carry Logic for Digital 

Computers", IRE Trans, on Electronic Computers, vol. EC-4, no. k, pp.. 133-136, 
December 1955. 
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logg n digits for random addends, each with n non-sign digits, 
in contrast with the worst case of a carry sequence over n 
digital positions now provided for. 

2. Use of separate carry storage during multiplication with 

assimilation of carries at the end of the multiplication opera- 
tion .(2) (3) (4} Aside from the final assimilation, the time 
devoted to carry propagation is equivalent to a total of 2n 
digits during the n-step process. 

3- Recoding of the multiplier into digits which are -1 as well as 

and 1 in such a way that the number of non-zero digits, and conse- 

( 5 ) 

quently the number of uses of the adder, is reduced. A reduc- 
tion from on the average n/2 to approximately n/3 additions or 
subtractions, can be achieved. Furthermore, it is possible to 
shift two digital positions at each step, thereby halving the 
number of gating operations for shifts, without increasing the 
hardware requirements except for a conditional doubling circuit. 

It is the goal in the design presented here to exploit the proposals 
listed above as efficiently as possible. In particular, the use of separate 
carry storage is extended to include a sequence of arithmetic operations; 
numerical results are represented with carries unassimilated whenever possible. 
Carry completion circuitry is used in two ways; 

1. For the complete assimilation of carries when the conventional 
representation is required . 

2. For a partial assimilation to the sign digit, when the sign of 
the unassimilated number must be known. 



2. G. Estrin, B. Gilchrist, and J. H. Pomerene: "A Mote on High-Speed Digital 
Multiplication", IRE Trans, on Electronic Computers, vol. EG^5, no. 3, 

p. 140, September 1956. 

3. J. E. Robertson: "Preliminary Design of an Arithmetic Unit for Use with a 
Self -Checking Binary Parallel Digital Computer", Digital Computer Laboratory 
Report No. 19, June 1950. 

4. Project Whirlwind*, "Whirlwind I Computer Block Diagrams", Report R-127-1, 
vol. 1, p. 23, M.I.T., September 1947- 

5. The method of multiplier recoding was described to the author by David J. Wheeler 
in 1951 • It has recently been reinvestigated at the University of London, the 
National Bureau of Standards, and Aberdeen Proving Ground, Maryland. The method 
is similar to that described in 

Booth and Booth, Automatic Digital Calculators , 
Academic Press, Inc., New York; 1953, PP- 44-47 
although the latter do not fully exploit the advantages of the method. 
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Hardware requirements for the proposed design are such that the accumulator. (and 
temporary accumulator) are augmented by a carry register (and temporary carry 
register). The adder is replaced by a quasi-adder of equal complexity. A 
circuit for assimilation of stored carries in the carry register with the content 
of the accumulator is also required for conversion to the conventional repre- 
sentation; it is proposed that circuitry also be provided for generating comple- 
tion signals for carries propagating during the assimilation. 

The ramifications of the requirement that numerical results be held in 
unassimilated form whenever possible are extensive. They include: 

1. The best choice for representation of negative binary fractions 
is the two's complement representation. 

2. An entirely new analysis of overflow is necessary. 

These two statements can be partially justified by noting some features of a 
number represented in unassimilated form. First, the sign of the number in some 
cases is not known unless carries are assimilated, and second, with two registers 
used for representation of unassimilated numbers, the ranges of numbers repre- 
sented in the two registers can be extended during some sequences of calculations 
In contrast to conventional representations, there can be uncertainty either as . 
to the sign or as to the range, but not both. As is discussed in detail in later 
sections, the sign uncertainty dictates that arithmetic methods be independent of 
sign insofar as possible. Thus, one of the complementary representations of 
negative numbers is preferable to the signed absolute value for addition and sub- 
traction, since the latter representation requires an inspection of the sign of 
the difference of two absolute values. A detailed analysis of shifting between 
A and Q during multiplication indicates that inspection of signs is at times 
required for the one's complement representation, but not for the two's comple- 
ment representation. 

The range uncertainty for unassimilated numbers poses problems in 
overflow detection. In brief, three situations must be recognized. 

1. It is apparent from the unassimilated digits that overflow has 
occurred. 
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2. Overflow cannot be detected from the unassimilated digits, but 
ivould be detected if carries were assimilated. 

3. No overflow has occurred. 

The two overflow cases are called unassimilated overflow (case l) and assimilated 
overflow (case' 2). An overflow analysis is a necessity for an arithmetic unit 
design, and is applicable to the following: 

1. Detection of overflow in fixed point operations, 

2. Automatic scaling for floating point operations, 

3. Determining correct procedures following overflow occurring tempor- 
arily during one step of a multiplication or division. 

It will be shown that unassimilated overflow detection is sufficient for the three 
requirements, and that assimilated overflow detection is necessary only when 
assimilation is required for some reason. other than overflow detection. This 
approach' is consistent with the goal that results should be left in unassimilated 
form whenever possible 

The multiplication method proposed incorporates the following features: 

1. The multiplier is recoded to reduce the number of additions or sub- 
tractions. The rules for recoding the multiplier are such that no 
special attention need be given to the sign of the multiplier. A 
final step corresponding to the sensing of the sign digit of the 
.multiplier is required, regardless of the multiplier sign. 

2. No special attention need be given to the sign of the multiplicand 
if the accumulator can correctly accumulate either positive or 
negative partial products. Explicitly, successive partial products 
p^ (held in the accumulator) are related by the equation 

' P k+ 1 = 1/2 ( fk + ^n-k x) 

where the recoded multiplier digit y , is -1, 0, or +1. The 
sum (p. + y . x)may exceed range, but as a result of the shift 
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(i.e., multiplication by 1/2), the new partial product P^^ is 
again within range. The problem is one of determining the correct 
sign digit to be inserted into the accumulator during the right 
shift, and is closely related to the problem of detection of 
overflow in addition and subtraction. 

3. The use of a separate carry storage register in association with 
the accumulator poses a problem during the right shift. The least 
significant digit of the accumulator must be assimilated and trans- 
ferred into the quotient register. Further difficulties would 
arise in shifting one digit from A to Q, particularly for the multi- 
plier receding proposed, if negative numbers are represented in any 
form other than two's complement. 

The non-restoring division process seems most suitable for the arithmetic 
unit proposed. For a divisor y in M (Figure 8.1) and partial remainders 
r^ (k = 0, 1, n) in A and 5, the process is described by the recursion 

relationship 

Vi = 2r k i y 

where the sign is chosen in such a way that the equation 
for each k, provided only that the dividend Tq satisfies 

In order to choose the proper sign for the recursion relationship, the 
sign of must be known. In the proposed design, r^ would be transferred with 
a left shift from A and 8 to form 2r^ in A and C, in parallel with the partial 
assimilation ©f r^ to determine its sign. The fact that the partial assimilation 
and the shift can be paralleled for non-restoring division, in contrast to 
restoring division, dictates the choice of the division process. For a restoring 
division, the divisor is subtracted from (or added to) the partial remainder to 
form a tentative partial remainder. The sign of the tentative partial remainder 
is then sensed (by a carry assimilation to the sign digit) to determine whether 



r k 


< 


y 


is satisfied 


r 


< 


y 


» 
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the tentative partial remainder is gated from the adder to A and C or the old 
partial remainder is gated from A and C to A and C. To summarize, the steps, 
in order of their occurrence, are: 



1. 

2. 
3. 
4. 



Restoring division 

Subtract (or add) divisor in M 
from partial remainder in A, C. 

Assimilate to sign from adder. 

Transfer to A and C. 

Transfer from A and G to A and 
C with left shift. 



Non-restoring division 

1. Subtract or add divisor in 
M .f.rom A and C. 

2. Transfer to A and C. 

3. Transfer partial remainder 
from A and C to A and C with 
left shift. 

4. Assimilate to sign digit. 



None of the steps for restoring division can be paralleled; steps 3 and 4 of th 
non-restoring division can be paralleled. In restoring division, step 3 is 
conditional on the sign determined in step 2; in non-restoring division, step 1 
is conditional on step 4 and can proceed after both steps 3 and 4 are complete. 
The non-restoring division method requires that the assimilator be connected to 
A and C, restoring division requires a connection to the adder; for circuit 
reasons the former seems preferable. 

A reduction in division time can be achieved if the divisor y is 
standardized to lie in the range 1/2 < y| < 1, as would be the case for con- 
ventional floating point operation. In some instances, the partial remainders 
can be shifted left until the quantity in A and C is standardized; i.e., for ■ 
non-restoring division, until l/2 <. 2r k < 1. The number of uses of the adder 
and assimilator is thereby reduced, and if shifts of more than one digital 
position are available, the number of gating operations can be reduced, with a 
consequent reduction in shifting time. 

The method is applicable to both restoring and . non-restoring division 
although special treatment of the quotient digits is required for the latter. 
In a conventional non-restoring division, the divisor is either subtracted or 
added, with +1' s and -l's, respectively, inserted as quotient digits. A 
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relatively trivial conversion to the two's complement representation is then 
made. In the proposed method, the shift would require the inserting of as a 
quotient digit. It is possible to devise a serial method for conversion of 
quotient digits, which are +1, 0, or -1, to conventional binary form. The 
method requires that one reversed ternary digit be held, and the conversion to 
binary is made one step later on the basis of the sign of the new partial 
remainder. 



8.2 Separate Carry Storage in a Binary Arithmetic Unit 

The structure of a conventional binary adder is indicated in block 
diagram form in Figure 8«2, in which the symbols represent circuits whose opera- 




s. = a. © m. © c. 



c. , = (a. ©m.)c V a.m. 

1-1 X 1 1 11 



i = 0, I, n, where n denotes the least significant digit 



Figure 8.2. Conventional Adder 
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Lions are summarized in Table 8.1. 




In a conventional parallel asynchronous arithmetic unit, the digits a^ (i = 0, .. 
represent binary digits of an accumulator register A, the nu are digits of a 
number register M, the are sum digits to be gated to a temporary accumulator A 
and the c^ are carry signals internal to the adder. The index i, for each regist 
has the values 0, .1, . .., n, where n denotes the least significant digit. The 
ultimate speed of the conventional adder is limited by the fact that sufficient 
time must be allotted for the carry to propagate from the least significant to 
the" most significant digital position. 

If register storage for carries is provided, the carry chain can be 
broken either at points A or points B of Figure 8.2. For reasons given in 
following sections, breaks at point B are preferable, and lead to the logical 
structure of Figure 8.3- In Figure 8*3 , the digits b^ would be gated to a 
temporary carry register C and digits c^ are held in the carry register C. The 
logical structure of Figure 8„3 together with the registers A, A", C, C, and M 
and suitable interconnecting gates, is sufficient for sequences of additions 
and subtractions, including sequences required for multiplication. The true 
sum at any given instant of time would be found by assimilating the contents of 
A and C, or alternatively, of A and G. 
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Figure 8«3« <■ Proposed "Adder" Structure 



8»3 ! Separate Carry Storage for an Arbitrary Radix 

The process. of addition of an augend in unassimilated form and an 
addend in assimilated form for an arbitrary integral radix r can be viewed as 
consisting of two steps, as illustrated for the decimal system in Example 8'. 1. 
Each step involves, for each digital position, the formation of a digitwise sum, 
with the carry suitably displaced and stored separately from the sum (modulo r) 
that may arise. During the first step, digitwise sums and carries are generated 
from corresponding addend and augend digitsj during the second step, sums and 
carries are formed from the results of the first step. The final sum is repre- 
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sented as a set of digits modulo r with an associated. set of binary carry digits. 

The formation of the conventional representation of the sum from the two sets of 

digits (carry assimilation) is equivalent to an addition of the digitwise sums 

and carries. The second step is required in order that the property, 

"if c. i =1 then a. = 0" be preserved under addition. 
1-1 ' x r 

x = ol57 
y = .548 



. .bVt> a. 

x + y <; l 

100 c ± 

192 \ 

794 u ± 

110 k. 
i 

804 s ± 

100 b. 

l 



x + y + z 



Example 8.1 

One feature of the addition wi th separate carry storage is that assimi- 
lation is not required during a sequence of additions. The unassimilated augend 
is represented by digits a i (a i = 0, 1, „ . . , r-l) and carries c i (c i =0, l), 
the addend is represented by digits (nu = 0, 1, . .., r-l), where the index i 
increases with decreasing significance of the digital position. The a^ and c^ 
are restricted to. values such that if c^_]_ = 1> then a^ = 0. The first step in 
forming the sum is 

u. s m. ♦ a. modulo r. 
i i i 

If c. , = 1 k. , = c. , = 1„ 
i-1 i-1 i-l 

If = and nu + a^^ > r, then ^ _^ = 1. 

If c^_^ = and + a^ < r, then ^ = 0. 
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It should be noted that the restriction c^ ^ = l=^.a^ = insures that no 
carry can arise from the sum nu + a^ to interf ereowith the unassimilated 
carry c^ ^. The second step is then-, . 

s. = u. + k. modulo r, 
1 i i ' 

b i-r lifu i + k i>- r ' 

b. , = if u. + k. < r. 

i-l i . i 

Since u^ = 0, 1, . .., r-1, and t = or 1, it follows that b^ ^ = 1 only if 

u. = r-1, k. = 1, and therefore b. , = 1 =^s. = 0. It is thus possible to use 
J- l i-l i r 

the b^ and s^ as augend (cf . c^, a i ) in a subsequent addition. For the binary 
system (r = 2), the equations are: 

u. = a. © m. 

l l w i 

k. . = a. m. v c. , 
i-l i i i-l 

s i = u i ® k i = a i ® m i ® ( a i+l m i+l v c i^ 

b. , = u, k. = (a. © m. ) (a. , m. _ v c.) 
i-l i i i w i x l+l l+l x 

where the binary digits are related by Boolean operations defined in Table 8.1. 

It should be noted that the above equations agree with those found by 
the alternative method of considering modifications of a conventional binary 
adder. 

8.4 Binary Subtraction 

In an arithmetic unit with negative numbers represented in two's comple- 
ment form, subtraction is performed as an addition of the digitwise complement of 
the subtrahend (in M) to the minuend (in A and C) with a carry inserted into the 
least significant digit of the adder. For the quasi-adder with separate carry 
storage, the least significant digits of a sum ares 
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b n-2 = ^©V^ (a n % V C n-1 } 

Vl = (a n ® m n } C n S n-1 = a n-l © Vl © (a n ffi n V Vl 3 

b = s=a©m©c 
n n n ^ n w n 

By analogy with a conventional addition, the digit c n is zero for addition and 
is one for subtraction . In either case, b n is always zero. If c n were to be 
interpreted as an unassimilated carry digit resulting from an addition or sub- 
traction, it would follow that c = b = 0: it is therefore possible to rein- 
' n n 

terpret C n as a carry insertion signal generated by the control circuits. 



8.5 Carry Assimilation 

An important consequence of selecting points B of Figure 8.2 as the 

"til 

break points in the carry chain is that the digits s i and b^_^ of the i digital 
position cannot simultaneously be ones. This readily follows if we take u^ and 
k^ as defined in section 8.3 for radix 2 

and note that s. = u. ©k. , b. . = u. • k. , 

and note further that s. • b. 1 = k^'Cu^ • k^) = 0. 



For carry assimilation, we first assume that a full adder of the 
structure of Figure 8.2 is required, with inputs s^ and b^, with internal c 
d^, and with sum digits a^. The equations for carry assimilation are then 



a. = s. © b. © d. d. , = (b. © d. ) s. V b. • d. . 

1 i x 1 i-l i ^ i' i x x 

We now show that b^ • d^ = follows from • ^ =0, by induction on i. For 

i = n, d =0, therefore b d =0. The induction hypothesis is b. • d. =0. 
' n ' n n JSf x x 

If d. = 1, then s. must equal 1, implying that b. , = and b. ,d. , = 0. 
i-l ' x ' * J s i-l i-l i-l 

If b^_^ = 1, then s^. = and d^ 0. The carry' assimilation equations 
can therefore be simplified to 

a. = s.©(b.v/ d.) . d± _ 1 - 3 £ • (b.v d.). 
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Thus the circuitry for carry assimilation is approximately half as complicated 
as the circuitry for a conventional adder. 

The remarks of the preceding paragraph are based upon use of a con- 
ventional carry chain in the carry assimilation circuitry, rather than carry 
completion circuitry. 

The completion of the longest carry sequence during an assimilation 

can be sensed by use of a zero's carry signal d. as well as the usual one's 

1 1 
carry signal d. . The Boolean equations for assimilation with a carry completion 

th 

signal are, for the i digital positions 

a. = s. © (b. v db 
1 1 w 1 i 

d^ , = g s. (b. v d}) 
i-l & i v i i 

d° . = g (I. v b. d°) 
i-l to l i l 

h = (d° vdj) (d° v d |) ... ( d ° vdft 

where g and h are signals initiating the assimilation and indicating the comple- 
tion of carries, respectively. 

In operation, g is initially 0, and d? = d^ = 0, for each i. When 
assimilation is desired, g becomes 1 and at any one digital position, one of 
three operations will occur. 

1. A zero's carry will arise if = 0. 

2. A one's carry will arise if s. = b. = 1. 

ii 

3- If s^ = 1 and b^ = 0, either a zero's carry d? ^ or a one's carry 

df_^ will be propagated after the corresponding carry d? or df is 
generated by the next digital position to the right. 

When either a one's carry or zero's carry is generated at every digital position, 
completion of carries is signalled by h = 1. 
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Alternatively , carry assimilation could be performed by converting 
the structure of Figure 8.3 to that of Figure 8*2 by suitable switching circuits. 
It appears that switching circuits for such conversions are as complicated as 
those for separate carry assimilation described in the preceding paragraphs, and 
have the further disadvantage of increasing the number of circuits through which 
the carry must" •propagate 

For maximum speed of operation, carry assimilation should be performed 
only when absolutely necessary^ namely, when the number in A is to be transferred 
elsewhere j or in parallel with some other operations e.g., reading from core 
storage. For sign conditional operations, such as those which occur in division 
and sign conditional jumps, it is proposed that the sign digit carry completion 
signal be used to indicate that the sign has been correctly determined. 

8.6 Analysis of Overflow; Introduction 

The study of the behavior of an accumulator with separate carry storage 
requires an explanation of the representation of numbers with carries separately 
stored. Overflow analysis is done largely by analogy with conventional overflow, 
but is complicated not only by the fact that the number representation is unfamiliar 
but also by the fact that carries propagate to the left and must be disposed of 
if the number representation is to be consistent. 

Precisely what is meant by overflow must be redefined, since the fact 
that carries are unassimilated introduces uncertainty as. to the range of numbers 
represented! the difficulties seem largely conceptual and require no exorbitant 
increase in equipment for mechanization. 

An overflow analysis is applicable in two situations: 

1 . Detection of overflow during the fixed point operations of left 
shift and addition or subtraction and the equivalent problem of 
scaling for floating point. 

2 . Analysis of the repetitive steps performed during a multiplication 
or a division. Although overflow is temporarily permitted to 
occur, the product or quotient may be in range. 
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The latter situation is to be distinguished from the detection of over- 
flow in a product or quotient; overflow detection in these cases is described, 
respectively", with the discussions of multiplication and division procedures. 



8.7 Conventional Overflow Analysis 

The analysis is restricted to binary fractions with negative numbers 

represented as complements with respect to two; i.e., a given number x lies in 

the range -1 < x •c' 1 and is represented by binary digits Xq, X^, . .., x n such 

n -i " 
that x = -x n + 2 x. . Overflow is said to occur if, -as a result of some 

U i=l 1 

arithmetic operation, a result x lies outside the range -1 < x < 1. 

Overflow can be detected if each result x which may exceed ra,nge is 
represented as a complement with respect to four; i.e., if the register holding 
the number x is extended one binary digit to the left. The two digits to the 
left ©f the binary point are designated as and x^, and indicate the range 
of x as f©ll©ws: 

x_^ Xq range of x 

< x <f 1 

1 1 <; x < 2 

1 -2 < x <: -1 
1 1 -1 <. x < 

For the table above to be correct, a result x which may exceed the 
range -1 < x < 1 must nonetheless remain within the range -2 < x < 2. This 
condition is ©bviously satisfied by a sum or difference of two operands which 
are fractions, and by the result of the doubling of a fraction. 

A somewhat more general. viewpoint is useful for the discussions to 
follow. A number x can be represented as a complement with respect to 2 m by 
extending the register in which x is held to the left. There are then m digits 
to the left of the binary point, designated as x m+ -j_> x m+ 2> x 2' x 1* x ©' 
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Initially, m is chosen sufficiently large, and a particular value of an index 

k is found such that x • is a model of all digits to the left, i.e., 

x .-, = x • ^ = im = x , H a x.. . In this context x A is a model of all digits 
-m+1 -m+2 -k-1 -k . G 

to the left if the range -1 < x < 1 is not exceeded, and x ^ is a model of all 
digits to the left if overflow occurs during one of the elementary operations 
of addition, subtraction,, or. left shift of one digital position-. 



8.8 Number Representation with Separate Carry Storage 

In an arithmetic unit with separate carry storage, a number x is 
represented by twci sets' of binary digits a a^, a\^,- a n a &d 

c 0' C l> 



, c , with the a. in the accumulator and the c . in the associated 
' n-1 i i 



carry register. a ; ^ and c ^ = are models of all digits to the left* Employ- 
ing the relationship c^^ a i - for i - 1, . . . , n we can deduce from the digits 
a ^, a,Q, Cq, the information of Table 8.2 concerning the range of x. 



Table 8.2 





S -l 


' a o 


:% 


Range of x 


+ 








o 


< x < l| 


+ 





Q 


1 . 


1 < x .< 2. 







1 





1 < x < 2| -2 







1 


1 


.2 < x < -i 




1 








-2 < x < - | 




1 





1 


-1 < x < 


+ 


i 


l 





-1 < x < | 


+ 


r 


l 


1 


< X < 1 



x lies outside the 
range -1 < x < 1 



x is within the 
range -1 < x < 1 
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In essence, the effect of the carry assimilation is additive and can, if 
Cq = 0, increase the sum 2a_^ + (a^ + Cq) by one unit. If Cq = 1, the 
assimilation of carries will not affect the sum 2a ^ + (a^ + Cq) and the 
results of the conventional overflow analysis can be applied with 
2x + Xq = 2a ^ + (a^ + Cq) [modulo 

From the table it is apparent that three cases arise:- 

Class I 1. The unassimilated results indicate definitely that x lies 
outside the range -1 < x < 1. This unassimilated overflow 
occurs when a ^, a^, and Cq have states 001, 010, or Oil. 

Class II 2. The number x may or may not lie outside the range -1 < x < 1 
after assimilation. (States 000 and 100 of a ^, aQ, and Cq) 

3. The number x is definitely in the range -1 < x < 1. (States 
101, 110, 111 of a 1 , a Q , and c Q ). 

It is important to distinguish between unassimilated overflow (Case l) 
and assimilated overflow which may occur in Case 2. For convenience in termi- 
nology, a number is said to be Class I if unassimilated overflow has occurred, 
and is said to be Class II otherwise. 

It is necessary to consider two distinct representations of numbers; 
the unassimilated representation in the accumulator and carry registers and the 
conventional (assimilated) representation elsewhere. The function of the carry 
assimilator is to convert from the unassimilated representation to the con- 
ventional one; the conventional representation is a special case of the unassimi- 
lated representation, with the carry register zero. 



8.9 Method of Analysis for Overflow 

The representation of a number in the accumulator and carry registers 
comparable to the "in range" conventional representation is that of a Class II * 
number (no unassimilated overflow) with a ^ and c ^ = model digits for all 



-190- 



206 



digits of greater significance. With a Class II number as an operand , the result 
of an operation will not, in general, be a Class II number, nor ■will the model 
digits of the result have the same significance as those of the operand. If the 
result is represented by digits s^ and b^, then there is some value of k > 1 
such that s ^ and b ^ = are model digits for the result. A partial assimi- 
lation of carries to the left of Sq and b^ will ensure that b ^ = is a .model 
digit for the carry register, however, in some special cases s ^ is not the 
model digit for the accumulator register. It can be shown that una ssimi lasted 
overflow occurs if, after the partial assimilation, either 

1. s ^ is not the model digit for the s ^ for i > 1, or 

2. s ^ and b ^ = are model digits, and the result is a Class I 
number . 

Thus, if the operands are initially Class II numbers, the result is also Class II 
unless unassimilated overflow has occurred. 

For the analysis of multiplication, it is necessary to show that if a 

. ■ i 

partial product p^ is a Class II number, then the next partial product P^ + ^ is 
also a Class II number, where Pj c+ -j_ is formed by a right shift of the sum or 
difference + y, where y is the assimilated multiplicand. For division a left 
shift of a partial' remainder is followed by the addition or subtraction of the 



divisor y in such a way that 



y 



, for every k„ Thus, for the division 



analysis it is necessary to show that if r^ is Class II, then is also a 

Class II number. Overflow may, occur temporarily in either case. 

f 

The net result of the analysis is that it is. possible to perform a 
sequence of arithmetic operations without carry assimilation. For, overflow 
detection, it is sufficient to detect unassimilated overflow during the sequence 
of operations and to .detect conventional overflow when the result of the sequence, 
of operations is. .assimilated. Carry assimilation is not required for either 
overflow detection or for. a multiplication step, but is required during some, 
steps of, a division in order that the proper choice of addition or subtraction 
of, the divisor can be made. 
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8.10 Overflow-Detection; Left Shift of One Digital Position 



For a left shift of one digital position we form the s^ and b^ to the 
left of the binary point (i < 0) from the a^ and as follows: 

S = a i b = C l 



s -i = a o© c o b _i = ° 



s_ 2 = a^Qa^Q b_ 2 = 



s _ 3 = a _i© a _i a o c O 



Note that a partial assimilation of carries to the left of b^ has been performed, 

yielding b ^ = and s- as model digits . If initially the operand represented 

by the a^ and c^ was Class II (no unassimilated overflow) then the resulting s^, 

and b. of interest are (using c~a.. = 0): 
l v & 1 ' 



-1 







-3 



.0 



1 







-2 















s o 


b o 


a l 


. c l 


a l 


c l 









c l 


a l 


c l 









c l 



The exceptional case for which s ^ is not the model digit for the s 
for i > 1 is that case in which a ^ = 1, = 0, and c^ = 0. For this case, 
inspection of Table 8.2 reveals that the range of the corresponding operand x 
before the shift is -2 < x < - l/2; therefore the result y •= 2x after the shift 
is in the range -k < y < .- 1, and overflow occurs . 
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For the remaining cases, s ^ serves as the model digit for the 
(i > I). In these cases the digits s^, s Q , and b Q can be inspected and 
the shifted result categorized as a Class I or Class II result in accordance 
with Table 8.2. It can easily be verified that, for each of the possible 
Class I results, the values of Sq = a^ and b^ = are such that the correspond- 
ing range of the unshifted operand would indicate that overflow would occur 
during the left shift « 

In short, overflow detection during left shift involves sensing the 
special case a ^ = 1, a^ = 0, and Cq = before the shift, and detection of 
Class I numbers, after the shift 

Soil Overflow Detections Addition or Subtraction 

Since subtraction is executed as the addition of the complement of 
the subtrahend, it is sufficient without loss of generality to analyze the opera^ 
tion of addition only. Assuming that is the model digit for the addend, and 
that 0^=0 and a ^ are model digits for the augend, application of the "adder" 
equations (Figure 8«3)s 

s i = + < a i+ l m i+ l V/C i ) 

b 1-]L - (a i ©m i )(a. +1 m i+1 v c ± ) 

yields s^ = a Q © m Q ® k where k = ^-^1 V c 

s -l " a -l © Vo b -l " (a o® "b^ 

b -2 " S -lV0 
b_^ - i >3 . 



s -2 " a -l V m 



i > 2 
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Assimilation to the left of Oq, Sq yields 

s ii = a _i © ^o m o^ v a o°b k ) b -l = 

s i2 01 a -l^ a V n»Q v ic) v a_ 1 I m k b' i = i > 1 

= s^_ 2 i j> 2 

where k = a^m^ v Cq. 

The values of s'g, s'^ and s^ as functions of k, a ^, ag, and nig are 
summarized in Table 8„3° In this table a ^ = 0, a^ = 1 indicate overflow in the 
initial augend, and need not be considered further. Of the remaining cases s 1 ^ 
serves as the model digit except for the one case for which k =. aQ = 0, a ^ = = 1. 
Since k = a^m^ v c q * ®> then c^ = and either m^ or a^ is 0. The two cases are: 

1. a^ = with a_^ = 1, a^ = 0, Cq = imply that x in A and C is in 
the range -2 < x < -1 and niQ = 1 implies that m in M is in the range 
-1 < m < 0s therefore the sum s in 5 and 5 is in the range 

-3 < s < -1, and is outside the range -1 £ s < 1. 

2. m^ = with m^ = 1 implies that m is in the range -1 < m < -1/2 
and a ^ = 1, 'a^ = 0, c Q = imply that x is in the rang^e 

-2 < x < -1/2, therefore s is in the range -3 < s <. -1, and is 
outside the range -1 < s < 1„ 

Thus, for addition, overflow is detected by the usual unassimilated overflow 
ch'eck on the sum (for Class I results) and by sensing the special case 
k = a Q = 0, a_ x - m Q = 1. 
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Table 8.3 



k 


^1 


a o 


m 


s' 
-2 


-1 


s o 





























o 


1 


1 


1 


1 








1 








o 


1 





o 


1 


1 


o 


o 


o 





1 





o 


1 


1 ' 


o 





1. 


o 


1 


1 


o 


1 


o 


1 


1 


o 


1 


1 


1 


o 


1 


1 


1 


1 


1 


o 


1 


o 


o 


o 


o 


o 


1 


1 


o 


o 


1 


o 





o 


1 


o 


1 


o 


o 


1 





1 


o 


1 


1 


o 


o 


1 


± 


1 


u 


u 


1 


1 


1 


1 


1 





1 


1 


1 





1 


1 


1 














1 


1 


1 


1 


1 


1 


1 



Overflow in A, 



See text 



Overflow in A, 



8.12 Overflow Analysis: Addition or Subtraction Followed by a Right Shift 
One step of a multiplication can be described by the formula 

Vi = 1/2(p k + Vk x) 

where x is the multiplicand in M, y , is a digit of the recoded multiplier 
having one of the values -1, 0, or 1, and p^ and P^ + -^ a re successive partial 
products in unassimilated form in A and C. The quantity (p, + y ,x) in 5 and C 
is permitted to exceed range, but P^^ is correctly represented if the digit 
a'-^ inserted during the right shift is properly chosen. The operation will be 
correctly performed if, employing the notation of the previous section for the 
digits of X and C representing the sum, 
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a i = s o 



etc. , 



where the aj^ and c!^ are digits of A and C representing P^ + -j_« 

It should be noted that the digit a^ inserted during the right shift 
does not necessarily indicate the sign of P^]/ However, if s'2 would have been 
changed by assimilation of carries before the shift, then a'^ would be similarly 
changed by assimilation after the shift. 

The special case of k, a ^, a^, and iiIq respectively 0, 1, 0, and 1 
requires dissimilar treatment in detection of addition overflow from that re- 
quired for a multiplication step. The sum digit s^ need not be generated for 
overflow detection, since s'^ = and Sq = 1 by themselves indicate overflow. 
The digit s'g is in this exceptional case necessary for the right shift during 
multiplication for insertion as the digit Otherwise s^ need not be 

generated at all, since s'^ is the model digit for A, and = a o = s 'i wou ld 
be the correct digits in A for P^+i after the right shift. 



8.13 Overflow Analysis: Left Shift Followed by an Addition or Subtraction 

Such That the Result is in Range 

Each step of a non-restoring division can be described by the formula 

r k + i = 2r k i y 

where y is the divisor in M with digits nu , and r^ and r^ + ^ are successive 
partial remainders in A" and C with digits s. and b. representing r, and digits 
s| and b^ representing 1^.+-^. The choice between addition and subtraction is 
made in such a way that each r^ satisfies r^ £ y £ 1, although 2r^ may be 
outside the range -1 < 2r, < 1. The sign of r, is determined by a partial carry 
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assimilation to the sign digit, the divisor is subtracted from 2r^ if signs of 
and y agree and is added to 2r^ if signs of r^ and y disagree. This arith- 

yj . It. also 
! within this 
: are correct 



metic procedure guarantees that if 
guarantees that r, 



range and 



< |y 



l r kl 1 


yj , then 


r k+l 


< 


range -! 


. < r, < 1 
— k 


, since y 



It remains to be shown that s'^ and b'^ 



model digits for r^ + ^. 



The analyses of previous sections for the left shift and addition and . 
subtraction can be applied directly to show that s'^ and b'^ = are correct 
model digits for r^ + ^ if the digits nu of the divisor y are chosen so that the 
relation |r,J < jyl < 1 is satisfied, except for the special case for which the 



digits s ^, Sq, and b^ of are respectively 1, and 0. 

In the special case, the digits a ^ and a ^ of 2r^ are unequal, so 
that the assumption that a ^ is the model digit for the register A used in the 



addition-subtraction analysis is violated. The relation 



y 



< 1 imposes 



restrictions on the digits of r^ (and therefore on 2r^) and y (or its complement) 
such that we need consider only the two cases: 



2r, 



s -l s 


S l 


S 2 b 


b l 


a -2 


a -l 


a o a i 


C 


ra o 


"l 


k = a^rn^ v c Q 


1 


1 


1 





1 





1 1 








1 


1 


1 


1 





1 


1 





1 


1 





1 


1 


The values 


of 


s^, an( * 


b l 


must be 


those 


shown if 


r k^ 


-1; also, 


for the 


special case, 


r^ < -1/2 and therefore m^ = 


0,11^ = 1 


if 


7|> 




By an 



extension of the addition analysis,' it is easily shown that, for r j c+ -^j the digits 
s'g = s'^ = 1, s^ = 0. Thus s' ^ is the correct model digit for A in the special 
case. 
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8'. 14.. Mult ipIjpAt ion: Introduction 

The multiplication procedure described here is the result of an investi- 
gation into methods of exploiting the relatively simple arithmetic unit structure 
of Figure $.4, or some minor variant thereof . Comparison of Figure 8.4 with the 
Illiac arithmetic unit shown in Figure 8.1 indicates that only equipment required 
for separate carry storage has been added. Although some studies have been made 
of more complicated structures most easily described as binary versions of 
existing decimal arithmetic units, it is felt that the basic principles can most 
easily be presented in terms of the structure of Figure 8.4, and that these 
principles can be applied to multiple-adder or other more complicated arrange- 
ments should it prove desirable. 




M 



COMF? 
CKT. 



Figure 8.4. Block Diagram 
of the Proposed Arithmetic Unit 
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Without the equivalent of many adders available, multiplication 
necessarily involves a serial or serial-parallel sensing of the multiplier, and 
a sequence of conditional uses of the adder and shifts. 

The sensing of the multiplier can begin with either the most or the 
least significant digit. In either case, the accumulator must be augmented by 
a second register, if all digits of the double-length product are represented. 
For initial sensing of the most significant digit, the accumulator is extended 
to the left, with the augmenting register holding the most significant half of 
the product; for initial sensing of the least significant digit, the accumulator 
is extended- to the right, and the augmenting register holds the least significant 
half of the product. Factors affecting the choice of the method of sensing the 
multiplier include: 

1. the desirability of providing facilities for shifting both to the 
right and to the left, 

2. the possibility of holding in the augmenting register both product 
digits and multiplier digits, 

3. the nature of auxiliary equipment required for the augmenting 
register. 

In regard to shifting facilities, division requires the left shift and 
it seems desirable to provide right shifting facilities as well. If the right 
shift were not made available to the programmer, the choice of initially sensing 
the most significant digit of the multiplier could result in a decrease in 
hardware by the amount of equipment required for the right shift. 

The decisive consideration is apparently the amount of equipment 
required for the augmenting or multiplier registers. If the choice is made to 
sense the most significant digit of the multiplier first, methods have not yet 
been devised for the multiplier recoding proposed such that the augmenting 
register can hold both multiplier and product digits, nor can complementing 
facilities for the augmenting register be avoided. The best choice thus seems 
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to be the sensing of the least significant digit of the multiplier first for 
which methods of holding product and multiplier digits in a single register 
exist, and for which very little additional equipment over and above that 
required for a single length shifting register is required. The latter comment 
applies only if the choice of two's complement representation of negative 
numbers is made, as will be shown. 



8.15 The Recoding of the Multiplier 



With a complementing circuit located between the multiplicand in 
register M and the adder, it is possible to both add and subtract the multi- 
plicand from a partial product held in registers A and C. A third choice 
is to shift the partial product without use of the adder. Equivalently, it is 
possible to recode the multiplier into the digits -1, 0, and +1; the recoding 
is done in such a way that the number of uses of the adder is decreased, and 
requires the sensing of two multiplier digits and a mode digit w ^_i c+ ^« For a 
multiplier with n+1 digits y Q , y^, 
(k * 0, 1, . . . , n) are: 



the rules for thek th step 



w 



+ mode 



- mode 



recoded 

v v 

n-k+1 ''n-k-l •'n-k multiplier digit 



1 +1 

1 

1 1 -1, change mode to - 

1 +1, change mode to + 
1 10 

1 1 . -1 

1 1 1 



w 



n-k 



1 


1 
1 
1 



'n-k 


1 


1 
1 



1 





where for k = 0, w r+ ^ = 0, and for k = n, y_^ = y^. In addition to the equiva- 
lent recoded multiplier digit and the mode change rules, the table shows the 



binary values of the new mode digit w , , which controls the setting of the 
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complementing circuit, during the k th step, and a digit t , , which, if 1, indi- 
cates that the adder is used during the k 1 step. A negative multiplier poses 
no problems; it is only necessary that during step n,y ^ = Yq, and the adder 
is used if w^ and y ^ = y^ are such that t^ = 1. 

It is relatively easy to show that the number of uses of the adder 
tends toward l/3 the number of multiplier digits, as the number of multiplier 
digits increases, if the assumption is made that the binary digits of the multi- 
plier are independent and each is equally likely to be or 1. We fix our 
attention on one binary digit of the multiplier, say jy and inspect the binary 
digits of lesser significance to determine whether or not a use of the adder is 
indicated by the rules (specifically, by t^). If y^ = 0, with probability l/2, 

then it is equally likely that y. + , is or 1. If y. « y. , = 0, with proba- 

3 ^~ 3 J 

bility 1/4, then t. = 0. If y. = 0, y. = 1, then y 2 m ust be inspected. The 

3 3 3 J 

values y.. = 0, y = y^ +2 = 1 would yield tj.= 0,t^. +1 = 1, with probability 1/8. 

By an obvious extension of the argument, the probability p, that t. = 1 is, 

^~ 3 



Pl = 2(1/8 + 1/32 ♦ 1/128 + ..) - T" (1/4) 
1 i=l 



Thus p^ tends toward 1/3 as the number of digits of the multiplier increases. 

The multiplication time can be reduced if another feature of the 

recoding rules is utilized. If an addition or subtraction'" occurs during the 

step of the process, i.e., if t„ = 1, then no use of the adder can occur on 
\ st ^ 

the following (j-1) step. This feature can be verified by inspection of the 
rules or by a Boolean proof that t^ ° t^ ^ = 0, by direct substitution of the 
equations: 



and Wj - w J+l7j © w^y^ © y j7j _ r 
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It therefore follows that, for any pair of recoded multiplier digits, 
only one use of the adder is required. Thus, the number of gating operations 
for shifts can be reduced by a factor of two if a modified base 4 multiplication 
method is employed, with each shift displacing multiplier and partial product 
digits two digital positions to the right. Base k operation requires that the 
doubled multiplicand be available , that is, one base 4 digit of the multiplier 
is recoded as one of the reversed quinary digits -2, -1, 0, 1, or 2. 

8.16 The Multiplication. Right. Shift: Introduction 

For the right shift, consideration must be given to the insertion of 
the most ' significant digits into A and C and to the transfer of the least 
significant digits of A and C into Q. The necessity for an explicit negative 
multiplicand correction is avoided if the most significant digits are inserted 
in such a way that the resulting partial product in A and G is correct,, even 
if overflow has occurred. Correct insertion of the most significant digits 
is closely related to addition overflow, and is discussed in detail in section 
8.12 . ''•■'». 

Two aspects, of the transfer of digits from A and G into Q are described. 
The first aspect is of fundamental importance in the choice of the mode of repre- 
sentation of negative numbers and is concerned with the necessity for corrections 
if the quantities in A and C and in Q are of opposite sign. The second aspect is 
that carry assimilation is required for the digits transferred into Q, if no 
carry register is to be associated with Q. 

8.17 The Multiplication Right Shift; The Mature of Corrections if A and Q 

Are of Opposite Sign 

The method of multiplication proposed implies that successive partial 
products may differ in sign. A partial product p^ is represented by n+k+1 
digital positions, occupying n+1 digital positions of A and C, and k digits of Q. 
Since the adder has n+1 digital positions which sense A and C, it is necessary. 



-202- 



to consider the effect of an addition or subtraction of the n+1 digit multi- 
plicand to A and C in such a way that, the signs of successive partial products 
differ. The difficulties that arise are illustrated by examples, one for each 
mode of representation of negative numbers, of the addition 11/64 +.(-5/8) = -29/64 
under the assumption that n = k = 3. In each example, the addition is performed 
correctly for the n+1 most significant digital positions, and compared with the 
correct n+k+1 digit representation of -29/64. 



Absolute Value 

11/64 0.001 Oil 

-5/8 1.101 

Sum 1.100 011 

-29/64 1.011 101 



One' s Complement 

0.001 011 
1.010 

1.011 011 

1.100 010 



Two's Complement 

0.001 011 
1.011 

1.100 011 

1.100 011 



L 



J L 



Q 



Q 



Q 



The sum is correct in all examples provided that the digits in A are interpreted 
as -1/2 and the digits in Q are interpreted as +2~^(+3/8) according to the rules 
of the representation in use. It is only for the two's complement representation 
that the digits of Q are independent of the sign of A; otherwise it must be under- 
stood that the number in Q is positive and the number in A is negative. 

For the multiplier recoding proposed, or for an unrestricted hold- 
multiply instruction with other multiplication methods, a change in sign in 
partial products is inescapable. For arithmetic units employing the absolute 
value or 'one's complement representation, the designer is then faced with the 
problem of either altering the digits of both A and Q so that the eorrect 
n+k+1 digit representation is maintained, or shifting the least significant 
digit from A to Q where A and Q have opposite signs. Either alternative requires 
that the sign of the partial product be known but if the partial product is repre- 
sented with carries separately stored, its sign in general can only be determined 
by a partial assimilation of carries. If the features of the multiplier recoding 
and separate carry storage are to be fully exploited, the conclusion is that the 
two's complement representation of negative numbers should be used. 
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8.18 The Multiplication Right Shift: Assimilation of .Carries for Digits 

Transferred from A and C to Q 

As noted in the discussion for binary subtraction, correct operation 
requires that c n , the least significant digit of the carry register C, be zero,, 
in order that the carry insertion for complementation will be correct. In a 
sequence. of subtractions, the digit b^, which would subsequently be gated into 
c n , is always zero; however, during multiplication, a right shift of one digital 
position requires that b n ^, which may not be zero, be transferred into a carry 
register R associated with Q, in order that c n will be zero. 

For purposes of analysis, we extend the discussion of carry assimilation 

to an n+k+1 digit representation of the partial product, assuming temporarily that 

A and C are n+1 digit registers augmented by k digits of Q with a carry register R. 

associated with Q. During a right shift of one digital position, the digit 

s of A" is transferred to become q, of Q, while b . of C becomes r~ of R. Thus 
ri M. n-1 U 

the relationship s n b n ^ =0 becomes q^rQ = 0, and after k steps of a multipli- 
cation, Q and R each contain k digits, q., r. , , with j = 1, .2, k, and the 

3 3 

relationship q. r. , =0 holds for each i. 

J J-l ■ 

The existence of R is a temporary fiction, since assimilation of Q and 
R can be performed serially as the shifting occurs. Assimilation of k digits of 
Q yields an assimilator carry 

e o = q i (r i v e l } 

where the relationship q.r. , =0 guarantees that e r n = 0. One step later, 

3 3 ^ ^ 

ep. is shifted right to become e' , r„ becomes ri , digits b n and s' are trans- 
_°_ 1' 1' ° n-1 n 

ferred from C and A into R and Q, and a single step of the assimilation yields 

r 6= b n-l. 

e 6 = S n (r l; Ve i ) V S n (r Ve ) 
q l = s n ® (r V e } ' ' 



Thus, a single step of the assimilation can be performed with each right shift, 
and a carry register associated with Q is not required. The assimilation of A 
and C following a multiplication proceeds in accordance with that following any- 
other arithmetic operation, except that d n = e^, and b n = r^, with b^d^ = 0. 

8.19 Multiplication Overflow 

Overflow can occur in multiplication when the multiplicand x and 
the multiplier y lie in the range -1 < x < 1, -1 < y < 1, if x = y = -1. Even 
if a hold-multiply is executed, overflow is indicated correctly if signs of 
multiplier, multiplicand, and product are simultaneously negative. 

(6 ) 

8.20 Division: Introduction 

The choice of restoring or non-restoring division requires consid- 
eration of 

1. the requirements for mechanization of the method-; with particu- 
lar attention to the mechanization of one step, and 

2. the nature of the quotient and remainder resulting from the process 
During each step, each division process requires a permutation of the same 

four operations. The paralleling of two operations is possible for non- 
restoring division, serial sequencing of the four operations is necessary for 
restoring division. With the exception of special cases such that divisor and 
dividend are equal in absolute value, the quotients resulting from the two 
processes are the same, and the corresponding remainders can be determined with 
equal facility. Thus, the choice of the non-restoring division process can be 
based on the timesaving made possible by paralleling of operations during each 
step. 

A further reduction in division time can be achieved by initially 
shifting divisor and dividend left until the divisor y is standardized to lie 
in the range l/2 < |y| < 1, and during the division, standardizing the quantities 
held in A and.C. The method can be applied to both restoring and non-restoring 



b. A new quaternary division method, which in many respects is the inverse of 
of the multiplication described in section 8.15, is discussed in Digital 
Computer Laboratory report no. 82, "A New Class of Digital Division Methods", 
by James E. Robertson, March 5, 1958. The new division method supersedes 
the method described in section 8.22. 
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division, but has the disadvantage that the remainder is relative to the 
standardized divisor, rather than the unstandardized one. The method is 
particularly attractive if the register M holding the divisor is a shifting 
register, as would be convenient for floating point operation. 

The programmer is interested in the characteristics of the division 
instructions, available. It is proposed that two division instructions repre- 
senting a compromise between ease of programming and ease of mechanization be 
available. The sorts of difficulties that arise can most easily be discussed 
in connection with the formula 

q . y + 2- n r n = r Q , 

where q is the quotient formed in register Q, y is the divisor in M, r n is 
the remainder in A" and C, and r^ is the dividend initially in 5 and C. In 
theory, q and r are functions of one another; for example q may be so adjusted 
that any one of the following restrictions applies to r^i < r n <j yj, 



<- 



r 

n 



< 1/2 



, < 



r 

n 



such that r has the sign of the dividend, 
n b ' 



etc. Unfortunately q is formed in the quotient register Q where addition 
facilities are unavailable, and the nature of q as formed by a mechanized pro- 
cess is such that the mathematically attractive choices of restrictions on r n 
cannot be achieved without an additive or subtractive change of more than the 
least significant digit of q. 

With these considerations in mind, the two division instructions pro- 
posed are then: 

1. Correctly rounded quotient in A with no remainder. This instruc- 
tion requires that n+1 non-sign digits of a quotient q be deter- 
mined, that q be transferred to A, destroying the remainder, and 
that 2~ n be added to q if the (n+l) St digit of q is 1. 

. 2c Quotient in Q with remainder in A and C For this instruction, 
the restrictions on r R are such that q as formed in Q by a 
mechanized process need not be modified in more than the least 
significant digit „ 
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The first instruction should suffice for the bulk of fractional single- 
precision calculations requiring division; the second will be useful for multiple 
precision or integer divisions. 



8 . 21 Non-Restoring Division 



. The recursion relationship for the partial remainders for non-restoring 
division is 



r k + l = 2r k " ^ 



where and 7 are partial remainder and divisor, with sign digits p^ and y^, 
respectively, and k = 0, 1, n is the recursion index. For k = 0, Tq is 

the dividend: and for k = n, r is the remainder. The r, are held in 5 and C, 
y is held in M. From the recursion relationship, it can be shown that 



-n 

2 r = r„ 
n 



y £ 2- ± {-l) 1 " 1 
i=l 



The quotient digit = +1 or -1 formed at each step is 



and the quotient q is then 



q = £ 2"\ = t 2" i (-l) 

i=l 1 i=l 



P i-l +y 



from which it readily follows that qy + 2 n r n = Tq. 



It can also be established by induction on k that - y < r^ < 



provided - 



± r < 



y o For the proof, consider two casess 



- 7 < r. < 



- r k < V 
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Then 



-2 



7 £ 2r, < 



or 



£ 2r k < 2 |y|. 



Noting that (-1) y = y|and that -(-l) = +1 for r k < 



and -(-l) = -1 for r, > 0, it follows that, for the two cases: 



- y - 2r v + 7 < 



1 2r k " 



which shows that 



<- r k + l ^ 



The quotient q with digits = +1, can easily be converted to the conventional 
binary representation; the conversion requires that the least significant digit 
q^ of the converted quotient be 1. If q^ is set to 0, the analysis above must . 
be modified as indicated by 



(q-2-")y + 2-"(r n + y) = r Q , 



i.e., a decrease in q by 2~ n corresponds to adding y to r^. 

The range restrictions on r^ apply in particular to the remainder r , 
and may be rewritten directly if . q^ = 1 or modified by addition of y to r R if 
q n = 0. - 11 

v =1 \ = ° 



r 

n 


< 


y < 





y < 


r 

n 


< 





2y < 


r 

n 


+ 


y < y 


r 

n 


< 


y > 





-y < 


r 

n 


< 





o < 


r 

n 


+ 


y < y 


r 

n 


> 


y < 





o < 


r 

n 


< 


-y 


y ^ 


r 

n 


+ 


y.< o 


r 

n 


> 


y > 







r 

n 


< 


y 


y i 


r 

n 


+ 


y < 2y 
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The above may be regarded as a list of all possible ranges of remainders avail- 
able without modification of more than the least significant digit q n of q. Two 
possibilities are: 



q » = 1 ' 

= 1 or in accordance with agreement or disagreement, respec- 
tively, of signs of r n and y. 



1. 

2. 



In either case, the undesirable situation r = y cannot be avoided. The 
' n J 

second choice seems preferable in that the remainder is less than y if y > 0; 
furthermore the remainder and the divisor agree in sign. It should be noted 
that either the remainder r n or the conditionally modified remainder r R + (1-q^y 
can be formed with relative simplicity. However, if it is required that a 



remainder r' be found such that < 
n — 

difficult, since 



n 



1/2 



the mechanization would be 



1. An additional step of the division would be required, involving 

destruction of r . 

n 

2. If r n and y agree in sign, addition facilities , may be required 
to increase q. 

The conclusion is that the division with remainder instruction be the second of 
the two possibilities listed above, and that a distinct division instruction 
resulting in a correctly rounded quotient without a remainder should also be 
available . 



8.22 The Standardized Division 

If the dividend and divisor y are initially shifted left so that y 
lies in the range 1/2 < | y < 1, the quantities 2r^ in non-restoring division 
can be similarly standardized with a reduction in division time, whenever 
£ J | < l/4° If m binary shifts are initially performed, the quotient and 
remainder satisfy 

q(2 m y) ♦ 2-V =2%, 
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indicating that q is unchanged, and that = 2 m r n . Thus the correct remainder 
r n could only be obtained by a right shift of r^ by m digital positions. The 
method can therefore be applied most easily to a floating point division or to 
a fixed point division which does not require a remainder. 

The quotient digit insertion for noh-re storing, division must be 
modified, as follows: 

Z^ = 1 if the subtraction - y is performed, 

Z = if a shift of r, , is executed, 
k k-l ' 

Z^ = —1 if the addition r^-i + y is performed. 

The conversion of the reversed ternary representation to the conventional binary 
representation can be performed serially if one reversed ternary digit, say Z , 
is held for one step and modified on the basis of the sign p. of the next 
partial remainder r^. During standardization, of course, the signs of successive 
partial remainders do not change; therefore, the partial remainder signs need be 
determined only for those r^ immediately following an addition or subtraction. 
The method of conversion of a single reversed ternary digit Z^ to the corres- 
ponding binary digit q^_^ i- s: 



\ 


(-1) 


+1 


-1 


+1 


1 





-1 





1 


-1 


-1 


-1 


1 



X-i 
o 
i 
i 




1 



Some idea of the reduction in division time can be gained from the 
following calculation. If it is assumed that y is equally likely to be any- 
where in the range l/2 < j yj < 1 and that each r k is equally likely to be 
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within the range < 



< y , then the range of r. such that at least one 

k 1 



use of the adder is eliminated' is < j r^ | < l/4 with a probability of ^ 
The fractional reduction R in uses of the adder can be calculated by averaging 
this probability over the interval [±/2,l\ of |y| : 



R = 




4|y| 



h/2 



- 1/2 



In y 



1/2 



1/2 In 2 = 0.346 



8.23 Division Overflow 

p +y 

If the division process is begun by formirfg - (-1) y, the 

result is a numerical comparison of absolute values, of divisor y and dividend 
Tq. The sign of this result can then be used to determine the sign of the 
quotient. The quotient sign can be independently determined by inspection of 
signs of divisor and dividend. Overflow is indicated if the results of the 
two methods of sign generation differ. 



8.24 Quaternary Operation 

The theory of the arithmetic unit presented thus far has been express 
in terms of binary operation. There appear to.be a number of advantages to 
quaternary operation of the arithmetic unit. Among these are: 

1. The equipment required for the carry register would be halved. 
Only base 4 carries would be stored, and shifts would involve a 
displacement of carries over two binary digital positions, 

2. Carry assimilation could be completed more quickly. With proper 
attention to circuit details, the carry through a quaternary 
digital position can be completed as quickly as a binary carry 
propagation. 
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3. Gating time for shifts could be reduced in multiplication and in 
the standardized division. The characteristics of the multiplier 
recoding are such that at most one use of the adder is required 
for each quaternary multiplier digit. During division, the 
number of gating operations would be decreased in those instances 
where standardization is possible. 

With one exception, the theory of binary operation can be readily^ extended for 
quaternary operation. For storage of carries, the general theory for arbitrary 
radix r can be applied with the result that the existence of a carry implies 
that the associated quaternary sum digit is 0. On the other hand, a modified 
quasi-adder can be designed in such a way that the existence of a carry implies 
that the most significant binary digit of the associated quaternary sum digit 
is 0. The latter design is slightly simpler and faster. 

Quaternary operation in multiplication and division requires that, 
if m is the multiplier or divisor in M, the quantities m, -m, 2m, and -2m. be 
available for addition. Such a doubling and complementing . circuit is approxi- . 
mately twice as complicated as the complementing circuit required for binary 
operation. 

For arithmetic operations, it would be sufficient to provide gating 
circuits for quaternary shifting only, since the lack of binary shifts can in 
part be compensated for by use of the doubling circuit attached to M. On the 
other hand, binary shifts are preferable for logical operations. Further 
investigation is necessary to determine whether or not both binary and quaternary 
shifting facilities should be provided. 

8.25 Estimates of Operation Times and Hardware Requirements 

In the absence of large-scale experimental data, estimates of operation 
times and hardware requirements are at. best approximate. It is nonetheless 
possible, if assumptions are consistent, to compare the relative merits of various 
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proposals for the arithmetic unit structure. The tabular material which 
follows is sufficiently detailed so that the calculations of time and hard- 
ware estimates can easily be repeated for design modifications necessitated 
by experimental results. • 



Circuit 



The data on which the estimates are based are these: 

Transistors Diodes T + l/2D Operation Time 



FLIPFLOP (f .) h 

HALF ADDER (h) 8 

SINGLE GATE (s) 2.3 

DOUBLE GATE (d) If-, 3 

AND CIRCUIT (a)- 2 

OR - CIRCUIT (o) 2 

NOT . CIRCUIT (n) 2 

LEVEL RESTORER (r) 3 

COMPLEMENTING CIRCUIT (c) 6 



11 
6 

0.7 

1.2 



2 
'k 

3 
2 



9.5 
11 

2.7 

h.9 
2 

3 
k 

7 



25 mu.s 
100 mu.s 
50 nps 
5 nps 
5 w- S 
15 nps 
15 mu.s 
10 m(is 



NOTE 1 The half adder is composed of 2 ANDS, 1 OR, and 1 NOT circuit 
and yields a sum s and carry c from two binary inputs x and y 
according to the Boolean equations c = x • y s = (x • y ) • (x\/y ) 

NOTE 2 The operation time of the flip flop is included in the gating 
times. The time given for single gating includes an estimate 
of clearing time as well. 

NOTE 3 Equipment estimates for gates include equipment for gate driver 
circuits i 



include : 



The variations in design result from a number of choices which 

1. Method of gating during transfers or shifts, 

2. Inclusion of storage facilities for carries, 

3. Use of a carry completion signal, during either conventional 
addition or carry assimilation, 

k. Receding of multiplier, 
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5. Quaternary operation, 

6. Shifting number register. 

In order to study the merits of the proposals, we determine the, 
hardware and speed characteristics of various arithmetic units, the first 
(unit A) being a transistorized version of the Illiac, the remainder embody- 
ing one or more proposals as follows: 



Unit 


Method of 
Gating 


Separate 
Carry Storage 


Carry 
Completion 


Multiplier 
Sensing 


Base Used 


Number 
Register 


A 


Single 


No 


No 


Binary 


Binary 


Fixed 


B 


Double 


No 


No 


Binary 


Binary 


n 


C 


11 


Yes 


Yes 


Binary 


Binary 


tt 


D 


it 


No 


No 


Reversed 
Ternary 


Quaternary 


it 


E 


it 


Yes 


No 


Reversed 
Ternary 


Quaternary 


ti 


F 


.it 


Yes 


Yes 


Reversed ' 
Ternary 


Binary 


it 


G 


it ■ 


Yes 


Yes 


Reversed 
Ternary 


Quaternary 


it 


H 


it 


Yes 


Yes 


Reversed 
Ternary 


Quaternary 


Shifting 



It should be noted that unit G is the' unit proposed for fixed point 
operation; units E and F are modifications of unit G, such that the merits of 
carry completion circuitry and quaternary operation can be determined. The 



equipment costs 


per bit 


for reg 


Sisters 


and gates 


are then 






Register A . 


Q 


M 


C 


Totals 






Unit 
A 2f+l+s 


2f+3s 


:.'f 




. 5?'+7s . 


Trans . ': 
36,2 


Diodes 1 .. 
59-8 


T. + 1/2D 
66.1 


B 2f+4d 


2f+3d 


f 




5f+7d 


50.2 


63.3 


81.9 • 


,C 




■ f 


2f+Ud 


7f+lld 




90.0 


120 A 


D 




f 




5f+7d 


50.2 


63.3 


81.9 


E ■ " 




f 


f+2d 


6f+9d 


62.8 


76.7 


101.2 


F 




f 


2f+U:d 


7f+iid 


75^ 


90 ;0 


120.4 


G 




f 


f+2d 


6f+9d 


62.8 


76.7 


101.2 


H 




2f+3d 


f+2d 


7f+i2d 


79.7 


9I.2 


125,3 
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and the equipment costs for adding, complementing, doubling or assimilating, 
as the design requires, are: 



Unit 


Adder or 
Quasi -adder 


Assimilator 


Comp . or 
Doubling 


iotala 


Trans . 


Diodes 


T+1/2D 


A 


2h+o+0.2r 






2h+o+0;2r+c •: 


2h f 6 


16.6 


32*9 


B 


11 




c 


2h+o+0.2r+c . 


2k. 6 


16.6 


32.9 


r 




h+ij-a+Pn+O 3r 




3h+Ua+3o+0.3r+c 


kk.9 


26; 9 


58:4 


t 

D 


2h+o+0 . 2r 




2c 


2h+o+0.2r+2c 


30.6 


18.6 


39.9 , 


E 


2h+o 


h+o+0.2r 


2c 


3h+2o+0 . 2r+2c 


40.6 


26.6 


53-9 


F 


it 


h+4a+2o+0.3r 


c 


Sh+^a+^o+O . 3r+e 


kk.9 


26.9 


58.4 


G 


it 


ii 


2c 


3h+i<-a+3o+0.3r+2e 50.9 


28.9 


65A 


H 




ti 


2c 


ii 


ti 




11 



For our speed estimates of arithmetic operations, it is convenient to calculate 
the time required for the basic operations, for an n bit "arithmetic unit . 



Unit Shift Adder or Quasi -adder with 

(t ) Complementing and Doubling, (t ) 
s. a 



Assimilation 



A 2s = 0.2 c+2h+n(a+o+0.2r) 
B 2d = 0.1 

c+2h = 0.06 
" 2c+2h+n(a+o+0.2r) 
." 2c+2h = 0.07 

e+2h = 0.06 
2c+2h =0.07 



0,06+0.013h 



h+a+log 2 n(2a+o+0;3r) = 0.03+0.02 log 2 n 



0.07+0.013n 



h+O.5n(a+o+0.2r) = 0.025 + 0.007n 
h+a+log 2 n(2a+o+0.3r) =0.03+0.02 log 2 n 
h+a+0 ; 5 log 2 n(2a+o+0 . 3r ) = . 03+ . 01 log 2 n 



NOTE: For units C,F,G, and H, t is the average assimilate time. For the maxi- 
mum t , replace log 2 n by^n. Otherwise, maximum and average times are 
the same; 
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Approximate formulas for operation time are then; 



Addition Assimi- Multiplication . Division 

or Subtraction lation Avg. Max. Avg„ Max. 

A t 6 *t^ n(Vl/2t a ) a(t 8 n a ) n<VV 

B » — n(t s 4l/2t a ) n(t s n a ) nCt^) n(t $ H a ) 

G » t n(t +l/2t ) n(t*t.) Note 1 Note 1 

6 s a 3 a 

rt > ■ n(l/2t s +l/3t a ) n(l/2t a *lAt a ) n<yt^) nft.^) 

18 « t h(l/at ,*l/3t„) n(l/2t a +l/S^) Note 2 Note 2 

f? ■ a "a s a 

F n V n(t ♦1/3V) n(t +l/2\) Note 1 Note 1 

<* f t n(l/2t +l/3t ) n(l/2t *l/2t.) Not© 1 Note 1" 

9 « a a a 

« P t n(l/2t b *l/3tj n(l/2t +l/2t ) Note 3 Note 3 

NOTE 1 For units Q, f 4 and G, the division time is 

n[l/2 t^ ♦ t^ + max (1/2 t &i t^)], where t£ is the time for 
assimilation %o the sign digit „ Average and maximum division time 



estimates are determined from average and maximum values of t 



NOTE 2 For unit E, the division time is .determined- -bjf •t-herf 1 <?rmula: of NOT? I, 
except that t^ is the assimilation time for all carries.. 



NOTE 3 For the average division time of unit.H, multiply the formula of 
NOTE 1 by (1 - a/5 
formula of NOTE 1. 



NOTE 1 by (1 - Q/5j^n 2)« For the maximum add l/2 n t to the 

s 
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The speed and hardware requirements are, then, for n = 52 



?S3 
CO 

Co 



Unit 



Avg . Max. 



Total Equipment 
Add Multiply Divide 52 bits 

: Avg. Max. Avg. 'Max. ' Trans. Diodes T + l/2D 



i 

ro 



i 



A 


0.2 


0.736 






0.936 


29,6 


48.7 


I+8.7 


48.7 


: 3160 


3980 


.5150 


B 


0.1 


O.736 






0.836 


2k.k 


43-5 


43-5 


43.5 


. 3890 


. 4i6o 


5970 


C 


0.1 


0.06 


0.15 


1,07 


d.16 


6.76 


8.32 


10-20* 


61. 5 + 


6250 


6090 


9300 


D 


0.1 


0.7^6 






0.756 


15.5 


22.0 


44.0 


44.0 


'4200 . 


4260 


6330 


E 


0.1 


0.07 


0.39 


0.39 


0,17 


3.81 


4.42 


26.5 


26.5 


5370 


5370 


8060 


F 


0.1 


0.06 


0.15 


1.07 


0.16 


6.2k 


6.76 


10-20* 


6l. 5 + 


6250 


6090 


9300 


G 


0.1 


0.07 


0.09 


0.55 


0.17 


3.81 


4.42 


* 

10-20 


35 + 


5910 


5490 


8660 


H 


0.1 


O.07 


0.09 


0.55 


d.17 


3.81 


4.42 


7-l4 


35 + 


6790 


6250 


9900 



* . These figures are guesses since the characteristics of numerical data are not sufficiently well 

known for the calculation of t . 

c 

+ It is highly unlikely that these maxima would be obtained in any practical situation since they ; 



assume a maximum t Q at each step* 



8 . 26 Interpretation of Speed and Hardware Estimates 

In sections 3-2 and 3-3 > a criterion for the choice of one of 
several alternate designs is presented. If T is a measure of the faultless 
computer time required for solution of a problem, and n is the number of 
equally reliable switching elements in the computer, then the criterion is 
that a 1$ increase in n should yield a 1.05$ increase in speed. The criterion 
is used here to evaluate the arithmetic unit proposals, with T interpreted 
as the average multiplication time, and n interpreted as a measure of the 
hardware requirements for the arithmetic unit. We may consider T to be a 
measure of the time for solution of many problems, however, n should properly 
be a measure of the number of switching elements in the entire computer rather 
than in the arithmetic unit alone. However, if a proposal satisfies the cri- 
terion with n interpreted as the amount of hardware in the arithmetic unit, 
it will certainly satisfy the criterion with n a measure of the total computer 
hardware, since a 1$> increase in arithmetic hardware is less than a 1$ in- 
crease in totai computer hardware . 

Using the results of section 8.25, we compute, relative to Unit A, 



Units 


AT 


T 


At 
t 


^n 


n 


As. 
n . 


n/\T 
TAn 


B/A 


- 5-2 


2h.k 


0.213 


820 


5150 


0.159 


l.3k 


C/A 


-22; 84 


6.76 


3^38 


>150 


n 


0.806 


1+.19 


D/A 


-14.1 . 


15.5 


0.909 


1180 


ti 


0.229 


3^97 


F/A 


-23-36 


6.2k 


3.7^ 


^150 


11 


0,8o6 


k.6k 


G/A 


-25-59 


3.81 


6.72 


3510' 


IT 


0.682 


9.85 



Thus, Unit G, which results in a six-fold increase in the multiplication rate 
for a 2/3 increase in hardware, is best by the criterion. It may be noted 
that units F and H are not included in the above table since the effects of 
carry assimilation time and division time are refinements outside the realm 
of the relatively crude approach of this section. Furthermore, the feasi- 
bility of the fast division for unit H is dependent on shifting facilities 
for register M; the decision as to whether or not M should be a shifting 
register is properly related to whether or not floating point arithmetic 
facilities are available. 
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'8.27 Floating Point Arithmetic 

Although the details of the mechanization of floating point operations 
are subject to the results of future investigations, it is nonetheless true that 
certain requirements are imposed on the arithmetic unit structure independent of 
the details. A number in floating point consists of a fractional part and an 
exponent. Two such floating point numbers are, as far as their fractional parts 
are concerned, treated in much the same way as fixed point quantities, except 
that values of exponents may require preliminary operations for addition and 
subtraction, and that addition or subtraction facilities are required for 
exponents in multiplication and division. Furthermore, standardization of re- 
sults may be necessary. 

For floating point operation, it is proposed that: 

1. Fractional parts of floating point numbers should be expanded to 
fixed point precision in the arithmetic unit. This feature tends 
to reduce round-off errors if, as is proposed in Chapter 3, 
extra registers are provided for the arithmetic unit such that 
relatively complicated operations are performed in the arithmetic 
unit with a minimum of storage references. 

2. Shifting facilities should be provided for the memory register M. 
For floating point addition, for example, either the addend in..M or 
the augend^itf* A ; must be shifted, depending on the relative •■site of • 
the exponents = If M should be shifted, the alternatives are to 
either 

a. provide shifting facilities in M, or 

bo transfer the number in M to Q, thereby destroying the quantity 
in Q, shift in Q, and transfer the shifted result to M„ 

Since the gates for transfers between M and Q are as complex as. gates for shifting 
M, and since further advantages in division and in retaining the contents of Q 
accrue if the former choice is made, we propose that M be constructed as a shift- 
ing register* 
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With these choices in mind, we conclude that a floating point arithmetic 
unit requires the following hardware over and above that required for a fixed 
point unit: 

1. Addition and subtraction facilities for exponents. It may be 
possible to time-share these facilities with the address modifi- 
cation registers (B-lines ) - 

.2. Shifting facilities for memory register M. 

3- Suitable control facilities for standardization, etc. 

h. Storage facilities for exponents associated with operands . In 
some cases counters are also required for exponents. 

From the arithmetic unit designer's viewpoint, floating point operation 
utilizes the equipment required for fixed point operation, plus equipment for 
facilities described in the previous paragraph. Thus, a complete theory of fixed 
point operation including overflow analysis determines the major portion of the 
characteristics of a floating point unit. 
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